Google Accused of Letting OpenAI Use YouTube Videos to Train GPT-4
- In an effort to secure high-quality data to train their AI models, AI companies such as OpenAI, Google, and Meta have used tactics that are considered unclear.
A New York Times report states that OpenAI reportedly transcribed more than a million hours of YouTube videos to apply the data to train its most advanced large language model (LLM), GPT-4.
Reportedly, OpenAI developed the Whisper audio transcription model, which helps companies extract data from YouTube videos. The NY Times reports that OpenAI knew that this method might be subject to scrutiny, but they went ahead with it anyway because they believed it was fair use.
Interestingly, Google, the owner of YouTube, is also suspected of engaging in similar practices in its AI models, thereby violating creators' copyrights, quoted from Neowin.
The NY Times report aligns with The Information's report, which highlighted that OpenAI allegedly scraped data from YouTube videos and podcasts to train two of its AI systems. The report also suggests that OpenAI president Greg Brockman is also on the team.
When YouTube CEO Neil Mohan was interviewed by Bloomberg, he said that the company's policy "does not allow downloading things like transcripts or bits of video, and that's a clear violation of our terms of service."
However, when asked whether YouTube data was used by OpenAI or not, Mohan gave an ambiguous answer, saying, "I have seen reports that the data may or may not be used. I have no information myself."
The NY Times report further claims that some people at Google were aware of OpenAI's practice of copying YouTube data, but they couldn't do anything because Google also used the same practice to train its own AI models. But Google told The NY Times that it performs video data scraping only after the video creator gives his or her consent.
According to the report, it is claimed that Google asked the team to “change its privacy policy” by June 2023, “so that Google can leverage publicly available Google Docs, restaurant reviews on Google Maps, and other online materials.”