this post was submitted on 04 Jul 2023
11 points (82.4% liked)

Asklemmy

43898 readers
1127 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy ๐Ÿ”

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 5 years ago
MODERATORS
 

Will we try to prevent google (and other) scrapers?

The headline is pretty much a summary. "Google Says It will Scrape Everything You Post Online for AI" https://www.gizmodo.com.au/2023/07/google-says-it-will-scrape-everything-you-post-online-for-ai/

The first question is obviously; do we as a community on Lemmy even want to try and stop them from scraping our content here? If no; well. ok then.

If yes; how? I'm not sure if "preventing access" to unregistered users would really prevent this. Pretty sure google has enough money and manpower to figure out a way to make it their mission to get around "can only accessed by members" content.

top 8 comments
sorted by: hot top controversial new old
[โ€“] nottheengineer@feddit.de 24 points 1 year ago (1 children)

Why would we want to stop that? It's a public forum, so it should get scraped.

[โ€“] Screak42@lemmy.ml 14 points 1 year ago (2 children)

I personnaly agree with you. If content is not supposed to be searchable, maybe don't post it online. It is a different problem for writers, artists and possibly even journalists.

But I think it's a fair debate - unfortuantely one that was one (or the only?) reason the whole reddit API debacle startetd.

On the other hand maybe Lemmy should allow certain communities allow an "only for members" view?

[โ€“] nottheengineer@feddit.de 3 points 1 year ago

The reddit API thing started because reddit thought they owned the content and could lock it behind a paywall for people who want training data. But that fundamentally isn't the case, so that whole thing backfired.

If someone wants to own the content and restrict access, they have to distribute it on their own instead of using a public platform. Lemmy is the wrong tool for that.

[โ€“] buckykat@lemmy.fmhy.ml 1 points 1 year ago

The reddit scraping thing was always just a smokescreen for killing 3rd party apps

[โ€“] Bjoern_Tantau@feddit.de 5 points 1 year ago

Really hard to prevent this, as copying its contents to other places is more or less the point of the Fediverse.

[โ€“] Kururin@talk.kururin.tech 5 points 1 year ago

I think it will eventually be scrapped one way or another. If not Google some other company will.

[โ€“] Kolanaki@yiffit.net 5 points 1 year ago

Literally already posting to a public space everyone everywhere can access if they were so inclined. I don't see how scraping that same content would make any difference, unless it significantly impacted site stability.

[โ€“] buckykat@lemmy.fmhy.ml 3 points 1 year ago

The more of my posts the AI scrapes the better its opinions will be