254
OpenAI says it’s “impossible” to create useful AI models without copyrighted material
(arstechnica.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.
IANAL, why isn't it fair use?
The two big arguments are:
Have you confirmed this yourself?
https://www.cnn.com/2024/01/08/tech/openai-responds-new-york-times-copyright-lawsuit/index.html
The thing is, it doesn't really matter if you have to "manipulate" ChatGPT into spitting out training material word-for-word, the fact that it's possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it's a lot weaker than the original argument, which was that nothing of the original material really remains after training, it's all synthesized and blended with everything else to create something entirely new that doesn't replicate the original.
So that’s a no? Confirming it yourself here means doing it yourself. Have you gotten it to regurgitate a copyrighted work?
You said:
If an AI is trained on a huge number of NYT articles and you're only able to get it to regurgitate one of them, that's not a "substantial portion of the original work." That's a minuscule portion of the original work.