this post was submitted on 10 Jun 2023
397 points (97.4% liked)
Asklemmy
43898 readers
1495 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I hate reddit. But it feels like the library of Alexandria burning down (yea I know). All those google search results and educational subreddits that are shutting down forever, and because they are too small reddit won't force open them again.
A lot are in the pushshift archive, but that cuts of at 2022. Also, it doesn't include a lot of the smaller subreddits.
I have had my PC running 24/7 with multiple VPNs to avoid rate limits downloading as much as I can before the API dies, but with some blackouts moving forward a day I have already missed a few.
Like many others, I would often add "reddit" to the end of my searches to get better results, half the websites on web searches now are either AI generated, copies or on completely AD ridden websites that ask you to turn off your AD blocker.
how exaclty does this pushshift work? I downloaded some zsts from it but what do I do with them?
The file you downloaded is a compressed JSON file, it's not something you can really just look at. But it contains all the data needed to build a nice UI around.
I don't know what OS you are on but on linux you can run
zstd -d -c file.zst | jq .
and it will print everything in the file. It's not really readable though. Also it doesn't have any of the media content, only the text