this post was submitted on 09 Aug 2023
379 points (100.0% liked)
Technology
37720 readers
588 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.
Article 3 has a very broad exception for scientific research: "Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access." There is no opt-out clause to this.
Article 4 has a narrower exception for text and data mining in general: "Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining." This one's narrower because it also provides that, "The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online."
So, effectively, this means scientific research can data mine freely without rights' holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.
I think the key problem with a lot of the models right now is that they were developed for "research", without the rights holders having the option to opt out when the models were switched to for-profit. The portfolio and gallery websites, from which the bulk of the artwork came from, didn't even have opt out options until a couple of months ago. Artists were therefore considered to have opted in to their work being used commercially because they were never presented with the option to opt out.
So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project. I would favour a complete rebuild of the models that only draws from works that are either in the public domain or whose rights holders have explicitly opted in to their work being used for commercial models.
Basically, you can't deny rights' holders an ability to opt out, and then say "hey, it's not our fault that you didn't opt out, now we can use your stuff to profit ourselves".
Common sense would surely say that becoming a for-profit company or whatever they did would mean they've breached that law. I assume they figured out a way around it or I've misunderstood something though.
I think they just blatantly ignored the law, to be honest. The UK's copyright law is similar, where "fair dealing" allows use for research purposes (legal when the data scrapes were for research), but fair dealing explicitly does not apply when the purpose is commercial in nature and intended to compete with the rights holder. The common sense interpretation is that as soon as the AI models became commercial and were being promoted as a replacement for human-made work, they were intended to be a for profit competition to the rights holders.
If we get to a point where opt outs have full legal weight, I still expect the AI companies to use the data "for research" and then ship the model as a commercial enterprise without any attempt to strip out the works that were only valid to use for research.