this post was submitted on 05 Sep 2024

567 points (96.2% liked)

Technology

59135 readers

3347 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

567

Bots are running rampant. How do we stop them from ruining Lemmy? (lemmy.world)

submitted 2 months ago* (last edited 2 months ago) by Buttflapper@lemmy.world to c/technology@lemmy.world

302 comments fedilink hide all child comments

Social media platforms like Twitter and Reddit are increasingly infested with bots and fake accounts, leading to significant manipulation of public discourse. These bots don't just annoy users—they skew visibility through vote manipulation. Fake accounts and automated scripts systematically downvote posts opposing certain viewpoints, distorting the content that surfaces and amplifying specific agendas.

Before coming to Lemmy, I was systematically downvoted by bots on Reddit for completely normal comments that were relatively neutral and not controversial at all. Seemed to be no pattern in it... One time I commented that my favorite game was WoW, down voted -15 for no apparent reason.

For example, a bot on Twitter using an API call to GPT-4o ran out of funding and started posting their prompts and system information publicly.

https://www.dailydot.com/debug/chatgpt-bot-x-russian-campaign-meme/

Example shown here

Bots like these are probably in the tens or hundreds of thousands. They did a huge ban wave of bots on Reddit, and some major top level subreddits were quiet for days because of it. Unbelievable...

How do we even fix this issue or prevent it from affecting Lemmy??

you are viewing a single comment's thread
view the rest of the comments

[–] QuadratureSurfer@lemmy.world 3 points 2 months ago (2 children)

Easy way to get around that with "virtual" addresses: https://ipostal1.com/virtual-address.php

Just pay $10 for every account that you want to create.... you may as well just go with the solution of charging everyone $10 to create an account. At least that way the instance owner is getting supported and it would have the same effect.

[–] tal@lemmy.today 4 points 2 months ago* (last edited 2 months ago) (2 children)

Just pay $10 for every account that you want to create

So, making identities expensive helps. It'd probably filter out some. But, look at the bot in OP's image. The bot's operator clearly paid for a blue checkmark. That's (checks) $8/mo, so the operator paid at least $8, and it clearly wasn't enough to deter them. In fact, they chose the blue checkmark because the additional credibility was worth it; X doesn't mandate that they get one.

And it also will deter humans. I don't personally really care about the $10 because I like this environment, but creating that kind of up-front barrier is going to make a lot of people not try a system. And a lot of times financial transactions come with privacy issues, because a lot of governments get really twitchy about money-laundering via anonymous transactions.

EDIT: I think that maybe a better route is to try to give users a "credibility score". So, that's not a binary "in" or "out". But other people can see some kind of automated assessment of how likely, for example, a person might be to be a bot.

thinks more

I mean, this is just spitballing, but could even be done not at a global level, but at a per-other-user level. Like, okay, suppose you have what amounts to a small neural network, right? So the instance computes a bunch of statistics about a each user, like account age, stuff like that, and then provides that to the client. But it doesn't determine the importance of those metrics in whether the other user should see that post, just provides the raw data. You've got a bunch of inputs to a neural net, then. Then the other user can have a set of classifications. Maybe just "hide", but also maybe something like "bot" or "political activism" or whatever. And it takes those input metrics from the instances, and trains that neural net to produce client-side classifications, and then auto-tags users based on that. That's gonna be a pain to try to defeat, because the bot operator can't even see how they're being scored -- they haven't "gotten over the hurdle" or not.

But you don't want to make every end user train a neural net from scratch. Hmm.

So maybe what you do is let users create their own scores and expose those to other users, right? I think that I read that BlueSky does something like that, was working on letting users create "curated feeds" for other users. They're doing something simpler, no machine learning, but that's got some drawbacks, means that you have to spend more time determining whether a score is good. So, okay. Say I'm gonna try to score a user based on whether-or-not I think that they're a bot. I have the option to make that score publicly-available. Other users can "subscribe" to that metric, and when they do, there's a new input node added to their local classifier's list of input nodes. Like, "Dons Bot list".

But I don't have to subscribe to Don's Bot List, and even if I do, it doesn't mean that I automatically consider that other user a bot. Don's rating is just an input into whether my own classifier considers them a bot. If I regularly disagree with Don, even if I'm subscribed to his list, my local neural net will slash the importance of his rating. If I agree with Don unless some other input to my classifier's neural net is triggered, then the classifier can learn that.

[–] QuadratureSurfer@lemmy.world 6 points 2 months ago

Yep, exactly this. It might deter some small time bot creators, but it won't stop larger operations and may even help them to seem more legitimate.

If anything, my favorite idea comes from this xkcd:

https://xkcd.com/810/

[–] Dark_Arc@social.packetloss.gg 1 points 2 months ago

Yeah, BlueSky has this concept of user moderation lists. It's effectively like subscribing to a adblock filter. There might be some things blocked by patterns (e.g., you could have one that blocks anything that involves spiders) and there might be others that block specific accounts (e.g., you could have one that blocks users that are known to cause problems, are prone to vulgar language, etc).

I think the problem with credibility scores in general though, is it's sort of like a "social score" from black mirror. Real people can get caught in the net of "you look like a bot" and similarly different algorithms could be designed to game the system by gaming the metrics to look like they're not a bot (possibly even more so than some of the real people).

This is kind of what lead me down the route of bringing things back into the physical world. Like, once you have things going back through the normal systems ... you arguably do lose some level of anonymity but you also gain back some guarantees of humanity.

It doesn't need to be the level of "you've got a government ID and you're verified to be exactly you with no other accounts" ... just "hey, some number of people in the real world, that are subject to the respective nation's laws, had to have come into contact with a real piece of mail."

Maybe that just turns into the world's slowest UDP network in existence. However, I think it has a real chance of making it easier to detect real people (i.e., folks that have a small number of overlapping addresses). The virtual mailbox the other person gave has 3,000 addresses... if you assume 5 people per mailing address is normal that's 15,000 bots total before things start getting fishy if you've evenly distributed all of those addresses. If you've got 3,000 accounts at the same address, that's very fishy. Addresses also change a lot less frequently than IP addresses, so a physical address ban is a much more strict deterrent.

[–] Dark_Arc@social.packetloss.gg 3 points 2 months ago* (last edited 2 months ago) (1 children)

Hm... I'm not sure if this is enough to defeat the strategy.

It looks like even with that service, you have to sign up for Form 1583.

Even if they're willing in incur the cost, there's a real paper trail pointing back to a real person or organization. In other words, the bot operator can be identified.

As you note, this is yet another additional cost. So, you'd have say ... $2-3 for the card + an address for the account. If you require every unique address to have no more than 1 account ... that's $13 per bot plus a paper trail to set everything up.

That certainly wouldn't stop every bot out there ... but the chances of a large scale bot farms operating seem like they would be significantly deterred, no?

[–] QuadratureSurfer@lemmy.world 1 points 2 months ago (1 children)

That's a good point. I didn't know about the USPS Form 1583 for virtual mailboxes... Although that is a U.S. specific thing, so finding a similar service in a country that doesn't care so much might be the way to go about that.

[–] Dark_Arc@social.packetloss.gg 2 points 2 months ago (1 children)

True, though presumably users in those places would be stuck with the "less trustworthy" instances (and ideally, would be able to get their local laws changed to make themselves more trust worthy).

It's definitely not perfectly moral... but little in the world is and maybe it's sufficient pragmatic.

[–] QuadratureSurfer@lemmy.world 1 points 2 months ago* (last edited 2 months ago)

Yeah, the other thing I could see happening is a similar tactic used by scammers where they use Mules who pick up mail from various Airbnbs throughout whatever country, but this would definitely limit most bot operations... Unless some organization specializes in this and just offers some service to create a bunch of accounts for anyone willing to pay.

Also, how many accounts would you limit to a single address, and how long would you lock up an address before it could be used again (given that people do move around from time to time).

edit:typo.