In this post the authors explore how today’s contractual restrictions on AI mirror the concerns libraries raised 20 years ago during the US Copyright Office Digital Millennium Copyright Act (DMCA) Section 104 study. Further, they examine the differences between copyright law – which enables access through fair use and other rights – and contracts, which can carry legal weight and intimidation tactics, such as copyright warnings.
Actually there is a small but important difference. Libraries usually don’t generate (much) revenue. They’re funded by public money, institutions… And a subscription is like 25€ annually or it’s free… While AI companies have billions of dollars turnover and they’re very much for-profit. And Fair Use has other applications as well. It allows science, allows me to record television or listen to music in the car or together with friends. I don’t think we can lump all of this together.
You have to remember, AI training isn’t only for mega-corporations. By setting up barriers that only benefit the ultra-wealthy, you’re handing corporations a monopoly of a public technology by making it prohibitively expensive to for regular people to keep up. These companies already own huge datasets and have whatever money they need to buy more. And that’s before they bind users to predatory ToS allowing them exclusive access to user data, effectively selling our own data back to us. What some people want would mean the end of open access to competitive, corporate-independent tools and would leave us all worse off and with fewer rights than where we started.
The same people who abuse DMCA takedown requests for their chilling effects on fair use content now need your help to do the same thing to open source AI. Their next greatest foe after libraries, students, researchers, and the public domain. Don’t help them do it.
I recommend reading this article by Cory Doctorow, and this open letter by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries. I’d like to hear your thoughts.
Those are great links. I think I already read Cory Doctorow’s post.
I think I already struggle with the premise. I think Google, Facebook, etc using my data is NOT Fair Use. They can not just publish my full name, pictures and texts without my explicit consent.
And this is kind of lumping everything together again… For-profit AI and open-weight models to the benefit of humanity aren’t the same thing. And I think we should give open-weight models some advantage by applying different rules. I.e. let people use data more freely if they contribute something back and the resulting product can be used freely as well. And make the rules more strict for big and closed for-profit services. And demand more transparency as well.
I mean realistically, we don’t have any proper rules in place. The AI companies for example just pirate everything from Anna’s Archive. And they’re rich enough to afford enough lawyers to get away with that. And that’s unlike libraries, which pay for books and DVDs in their shelves… So that’s definitely illegal by any standard.
But I agree that learning something from a textbook is a different thing than copying it. The resulting knowledge escapes the copyrighted material. I believe that’s the same no matrer if it’s machine learning or me learning computer programming with textbooks… The thing is just that you can’t steal in the process. That’s still illegal. IMO.
One of my fears is that AI is as disruptive as people think. And that the market is going to be dominated by unsympathetic big-tech companies, due to the nature if it. I think we need some good legislation to push AI in the right direction, or we’re going to end up in some scifi dystopia where big companies just shape the world and our lives to their liking.
I mean realistically, we don’t have any proper rules in place. The AI companies for example just pirate everything from Anna’s Archive. And they’re rich enough to afford enough lawyers to get away with that. And that’s unlike libraries, which pay for books and DVDs in their shelves… So that’s definitely illegal by any standard.
You can make temporary copies of copyrighted materials for fair use applications. I seriously hope there isn’t a state out there that is going to pass laws that gut the core freedoms of art, research, and basic functionality of the internet and computers. If you ban temporary copies like cache, you ban the entire web and likely computers generally, but you never know these days.
Know your rights and don’t be so quick to bandwagon. Consider the motives behind what is being said, especially when it’s two entities like these battling it out.
I’m not that educated on US law and if everything is subsumed under Fair Use. I believe in Germany, we have a seperate rule for ephemeral copies during data processing and network transfers ($44a UrhG). So we don’t have to deal with that using a law that was more concerned with someone photocopying a book. And I believe some countries distinguish between commercial interests and non-profit research. Plus we have exemptions for example allowing someone to play music on their non-profit events, even without consent of the copyright holder. They still need to pay them a “fair” amount, but it’s not up to the copyright holder to decide… We specify under what circumstances libraries can use content, again differentiate between interests, and we have a rudamentary law concerning data mining for research since 2017.
I think some specific laws like that would be more suited to guide the issue with AI towards a healthy solution, than use one blunt tool for everything. Why not say AI training is allowed, but it requires a fair compensation? We could even have a standardized way of opt-in or opt-out… I’m not sure if we need that. But I’m fine with my blog posts and Free Software projects end up in some AI. But I don’t want them to listen in on my private conversations, like for example an Alexa could do… I believe that requires a law distinguishing between that. If everything is Fair Use, I can say goodbye to privacy, but at the same time cancel my Netflix and Spotify subscription, since I’m going to claim, I’m just collecting all of that for future AI training.
I -personally- think we can’t allow Amazon to spy on me and just claim it’s fair use. So context matters. And I also think the goal and nature of the AI matters. Research needs to be less strict than commercial interest. And I don’t think networking of digital devices can be handled the same way as AI training, I strongly believe that requires separate laws and also needs to factor in if there is some legitimate interest to begin with.
Private conversations are something entirely different from publically available data, and not really what we’re discussing here. Compensation for essentially making observations will inevitably lead to abuse of the system and deliver AI into the hands of the stupidly rich, something the world doesn’t need.
Private conversations are something entirely different from publically available data
But that’s kind of the question here… Is data processing Fair Use in every case? If yes, we just also brought private conversations and everything in. If not: What are the requirements? We now need to talk about which use cases we deem legitimate and what gets handled how… I think that’s exactly what we’re discussing here. IMHO that’s the point of the debate… It’s either everything… or nothing… or we need to discuss the details.
Compensation for essentially making observations will inevitably lead to abuse of the system and deliver AI into the hands of the stupidly rich, something the world doesn’t need.
I’m not sure about that. I mean I gave some examples with licensing music on events and libraries (in general). Does that also get abused by the rich? I don’t think so. At least not that much, so that makes me think that might be a feasible approach. Of course it get’s more complicated than that. Licensing music for example brings in some collecting societies, and all those agencies have proven to be problematic in various ways and all the licensing industry isn’t exactly fair, they also mainly shove money into the hands of the rich… So a proper solution would be a bit more complicated than that.
I mean I’d like to agree with you here and have a straightforward parallel on how to deal with AI training datasets. But I don’t think it’s as easy as that. We can’t just say processing data is Fair Use, because there are a lot of details involved, as I said with privacy. We can’t process private data and just do whatever with it. We can’t do everything with copyrighted material, even if it’s in the public. If a use is legitimate already depends on the details. And I think the same applies to AI. It needs a more nuanced perspective than just allow or prohibit everything.
I’m not discussing the use of private data, nor was I ever. You’re presenting a false Dichotomy and trying to drag me into a completely unrelated discussion.
As for your other point. The difference between this and licensing for music samples is that the threshold for abuse is much, much lower. We’re not talking about hindering just expressive entertainment works. Research, reviews, reverse engineering, and even indexing information would be up in the air. This article by Tori Noble a Staff Attorney at the Electronic Frontier Foundation should explain it better than I can.
yeah this is not an ally that I want. fuck them.