this post was submitted on 22 Dec 2024
1598 points (97.4% liked)
Technology
60112 readers
2187 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Unlicensed from the POV of the trainer, meaning they didn't contact or license content from someone who didn't approve. If it's posted under Creative Commons, that's fine. If it's otherwise posted that it's not open in any other way and not for corporate use, then they need to contact the owner and license it.
They won't need to, they will get it from Getty. All these websites have a ToS that make it very clear they can do whatever they want with what you upload. The courts will simply never side with the small time photographer who makes 50$ a month with his stock photos hosted on someone else's website. The laws will be in favor of databrokers and the handful of big AI companies.
Anyone self hosting will simply not get a call. Journalists will keep the same salary while the newspaper's owner gets a fat bonus. Even Reddit already sold it's data for 60 million and none of that went anywhere but spezs coke fund.
Two things:
Getty is not expressly licensed as "free to use", and by default is not licensed for commercial anything. That's how they are a business that is still alive.
You're talking about Generative AI junk and not LLMs which this discussion and the original post is about. They are not the same thing.
Reddit and newspapers selling their data preemptively has to do with LLMs. Can you clarify what scenario you are aiming for? It sounds like you want the courts to rule that AI companies need to ask each individual redditor if they can use his comments for training. I don't see this happening personally.
Getty gives itself the right to license all photos uploaded and already trained a generative model on those btw.
EULA and TOS agreements stop Reddit and similar sites from being sued. They changed them before they were selling the data and barely gave notice about it (see the exodus from reddit pt2), but if you keep using the service, you agree to both, and they can get away with it because they own the platform.
Anyone who has their content on a platform of the like that got the rug pulled out from under them with silent amendments being made to allow that is unfortunately fucked.
Any other platforms that didn't explicitly state this was happening is not in scope to just allow these training tools to grab and train. What we know is that OpenAI at the very least was training on public sites that didn't explicitly allow this. Personal blogs, Wikipedia...etc.