Thursday, May 16, 2024

OpenAI Accused of Using YouTube Content to Train Sora AI Without Google's Permission

 OpenAI, the research lab behind the popular ChatGPT chatbot, has been accused of using YouTube content to train its new text-to-video AI model called Sora without Google's permission.

According to a report by The New York Times, OpenAI used its Whisper speech recognition tool to transcribe over a million hours of YouTube videos, which were then used to train Sora. This means that Sora may be able to generate videos that are similar to, or even directly copy, existing YouTube content.


Google, which owns YouTube, is not happy about this. The company has said that it is "aware of the reports" and that it is "taking them seriously." Google has also said that it is "committed to protecting the rights of creators and ensuring that YouTube is a safe and trusted platform for everyone."


OpenAI has defended its use of YouTube content, saying that it did so in a "fair use" manner. The company has also said that it is committed to working with Google to address any concerns.


This is not the first time that OpenAI has been accused of using data without permission. In 2021, the company was accused of using scraped text from the internet to train its GPT-3 language model.


The use of copyrighted content to train AI models is a complex issue. On the one hand, it is important for AI models to be trained on a wide variety of data in order to be effective. On the other hand, it is important to respect the rights of copyright holders.


It is unclear what the long-term implications of this story will be. However, it is likely to raise questions about the ethics of AI development and the need for more regulation in this area.


What do you think about this story? Let me know in the comments below.

0 comments:

Post a Comment