Lemmy.World Announcements

29057 readers

7 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news 🐘

Outages 🔥

https://status.lemmy.world/

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to info@lemmy.world e-mail.

Report contact

DM https://lemmy.world/u/lwreport
Email report@lemmy.world (PGP Supported)

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Join the team

founded 1 year ago

MODERATORS

ruud@lemmy.world

lwadmin@lemmy.world

lwCET@lemmy.world

jelloeater85@lemmy.world

Serinus@lemmy.world

lw_mod_notification@lemmy.world

What is your opinion of the Large Language Model (LLM) argument made by Reddit? (lemmy.world)

submitted 1 year ago by FearTheCron@lemmy.world to c/lemmyworld@lemmy.world

12 comments fedilink hide all child comments

One of the arguments made for Reddit's API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven't seen a whole lot of discussion around this and would like to hear people's opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

you are viewing a single comment's thread
view the rest of the comments

[–] OptimusPrime@lemmynsfw.com 1 points 1 year ago* (last edited 1 year ago) (1 children)

Bullshit. This assumes the people training LLMs are the same ones building the datasets. Once a dataset is created, it can be used to train multiple models, meaning that there's no further impact on API usage.

[–] FearTheCron@lemmy.world 0 points 1 year ago (1 children)

Certainly the archived Reddit posts will be used for that for years to come regardless. What I am curious about is how do you feel about your posts contributing to the output of a LLM (independent of API usage costs)?

LLMs can be specialized to tasks by training them further on a curated set of data. For example, a LLM trained specifically on your posts will sound more like you than the LLM before the training. Does it bother you that someone may use your posts for this purpose?

[–] OptimusPrime@lemmynsfw.com 1 points 1 year ago* (last edited 1 year ago)

Well, these AIs are being trained on public figures, and there isn't much they can do unless they livestream with the AI impersonating them, allowing them to potentially identify who is behind it. How will people figure out if there's an LLM out there that speaks just like them? It's similar to fine-tuning AIs on artists to create art that mimics their style. It can be frustrating, but there isn't much anyone can do unless surveillance software is installed on every computer. In summary, I don't mind because I won't even find out.