this post was submitted on 02 Aug 2023
152 points (89.2% liked)

Technology

59377 readers
3934 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

cross-posted to: https://lemmy.world/post/2499861

As I said, I made a lossy reformat of the database and a lossless one for 6.0 Gib (6,477,905,920). compared to ~26GIB from Reddit, where fields are almost intentionally anti-compressed to take up more room.

If there is somewhere I can host it, let me know.

also, I couldn't figure this out, do sqlite databses store any information on the creator or editor of a document?

why it's lossyIt's missing a large table of base64 urandom technically required to recreate the document fully

all 9 comments
sorted by: hot top controversial new old
[–] BitSound@lemmy.world 59 points 1 year ago (1 children)

!datahoarder@lemmy.ml looks active and seems like a good place for it

[–] HelloHotel@lemmy.world 8 points 1 year ago* (last edited 1 year ago) (1 children)

thanks, how do I crosspost/ move this one?

[–] Skyhighatrist@lemmy.ca 15 points 1 year ago (1 children)

Using the web-ui, on this post there is an icon made up of two squares. It's right next to the star for saving the post. That's the cross post button.

[–] HelloHotel@lemmy.world 5 points 1 year ago* (last edited 1 year ago)

thanks, made updates to the post

[–] inspxtr@lemmy.world 21 points 1 year ago* (last edited 1 year ago)

here are a few options that I see but never actually use.

Your data don’t seem to be massive compared to the types of data people store on there. So I don’t think it’s gonna be an issue. Plus, if you deposit your data in 1 archivist place + 1 research place, the data may be used by more people. Don’t forget about licenses btw.

EDIT: added https://socialmediaarchive.org/ to the list, just found out about that.

[–] fiat_lux@kbin.social 3 points 1 year ago (1 children)

Is this derived directly from the data reddit stored/created or is it a reconstruction of some kind from observing the r/place output? I'm tempted to look at the table structures but not tempted enough to download 4 gigs of it just yet.

[–] HelloHotel@lemmy.world 4 points 1 year ago* (last edited 1 year ago)

rebuilt from reddit's offitial sources, still messing with optomizations, is adding a color definitions table worth it?

edit, YES, only 32 unique colors ever