this post was submitted on 22 Aug 2023
780 points (95.7% liked)

Fediverse

17903 readers
80 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS
 

It is probably due to a number of people stopping using their alts after some instance hopping.

Also a few people who came to see how it was, and weren't attracted enough to become regular visitors.

Curious to see at which number we'll stabilize.

Next peak will probably happen after either major features release (e.g. exhaustive mod tools allowing reluctant communities to move from Reddit) or the next Reddit fuck up (e.g. removing old.reddit)

Stats on each server: https://lemmy.fediverse.observer/list

you are viewing a single comment's thread
view the rest of the comments
[–] dingus@lemmy.ml 3 points 1 year ago* (last edited 1 year ago) (2 children)

They are perfectly normal. Unlike giant corporations, the people who run Lemmy don't have the money to support a fleet of failover servers that take over when the main server goes offline. That's basically the only reason you don't see lots of downtime from major corporations: investment in redundancy, so when something breaks, a perfect copy takes over. Server crashes happen all the time for major corporations, you just never see them due to investment in redundancy.

That's the difference between a community and a company. One takes actual investment from the community as a whole, and the other ruthlessly exploits for profit.

[–] RoundSparrow@lemmy.ml 11 points 1 year ago* (last edited 1 year ago) (2 children)

the people who run Lemmy don’t have the money to support a fleet of failover servers that take over when the main server goes offline.

That has nothing to do with the issue I'm talking about. Every server with the amount of data in them would fail. Doesn't matter if you had 100 servers on standby.

The Rust logic for database access and PostgreSQL logic in lemmy is unoptimized and there is a serious lack of Diesel programming skills. site_aggregates table had a mistake where 1500 rows were updated for every single new comment and post - and it only got noticed when lemmy.ca was crashing so hard they made a complete copy of the data and studied what was gong on.

Throwing hardware at it, as you describe, has been the other thing... massive numbers of CPU cores. What's needed is to learn what Reddit did before 2010 with PostgreSQL.... as Reddit also used PostgreSQL (and is open source).

That’s basically the only reason you don’t see lots of downtime from major corporations: investment in redundancy,

Downtime because you avoid using Redis or Memcached caching at all costs in your project isn't common to see in major corporations. But Lemmy avoids caching any data from PostgreSQL at all costs. Been that way for several years. May 17, 2010: "Lesson 5: Memcache;"

As I said in my very first comment, server crashing as a way to scale is a very interesting approach.

EDIT: Freudian slip, "memecached" instead of Memcached

[–] dingus@lemmy.ml 1 points 1 year ago* (last edited 1 year ago) (1 children)

That's a much more... coherent explanation than your original one, friend. I wouldn't have argued this point if you had started here.

[–] RoundSparrow@lemmy.ml 2 points 1 year ago* (last edited 1 year ago) (1 children)

If anyone bothered to actually look at the SQL SELECT that Lemmy uses to list posts every time you hit refresh it would be blindingly obvious how convoluted it is. yet the community does not talk about the programming issues and instead keeps raising money for 64 core hardware upgrades without recognizing just how tiny Lemmy's database really is and how 57K users is not a large number at all!

your original one, friend. I wouldn’t have argued this point if you had started here.

I mentioned "ORM" right in my first comment.

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
	   (
	   "post_aggregates" 
	   INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
	   )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE (((((((
  ((("community"."deleted" = false) AND ("post"."deleted" = false)) AND ("community"."removed" = false))
  AND ("post"."removed" = false)) AND ("post_aggregates"."creator_id" = 3)) AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)) AND ("local_user_language"."language_id" IS NOT NULL))
  AND ("community_block"."person_id" IS NULL))
  AND ("person_block"."person_id" IS NULL))
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;
[–] gonzo0815@sh.itjust.works -3 points 1 year ago (1 children)

Damn, so many joins :/

How could this monster be optimized though?

[–] RoundSparrow@lemmy.ml 7 points 1 year ago* (last edited 1 year ago) (1 children)

First optimization is to not fetch every field and prune it down. For example, it gets public key and private key for every community and user account - then does nothing with them. That's just pushing data between Rust and PostgreSQL for no reason. That kind of thing is pretty obvious.. the huge number of things listed after "SELECT".

The whole approach is what I recently described as: make a JOIN fusion implosion bomb, then wait for null columns to fall out

There are short-term and long-term solutions. Right now there is already a new feature that will add one more JOIN that is pending merge.... "instance blocking" by each single user.

Based on the server overloads and resulting crashes, I think some obvious solutions would be to remove post_aggregates table entirely and just throw more columns on the post table... I've seen people do stuff like that. But really you have to have a concept of core foundation.

To me the core foundation of Lemmy data is that people want fresh meat, when world events get into a frenzy, they want to F5 and get the LATEST post and the LATEST comments. Data should have a big wall between the most recent 5 days and everything else. It's the heart of the beast of human events and a platform like this.

From that perspective, that fresh posts and fresh comments mean everything, you can optimize by just doing a INNER SELECT before any JOIN... or partition the database TABLE into recent and non-recent, or some out-of-band steps to prepare recent data before this SELECT even comes up from an API call... and not let PostgreSQL do so much heavy lifting each page refresh.

[–] gonzo0815@sh.itjust.works -2 points 1 year ago

If I remember, I'm gonna look into that tomorrow when I'm not on a phone screen. Not that I could contribute anything, but this seems like a good opportunity to learn some advanced stuff. Thanks for your answer!

[–] TrustingZebra@lemmy.one 4 points 1 year ago (1 children)

Community stuff can work well if done right. For example you don't see Debian repositories constantly crashing.

[–] RoundSparrow@lemmy.ml 0 points 1 year ago* (last edited 1 year ago)

Community stuff can work well if done right. For example you don’t see Debian repositories constantly crashing.

I don't follow your comment, are you suggesting I said something negative about open source project communities? I was talking about the Lemmy social media communities who actually comment and fund the 64-core server upgrades without asking why the site crashes with only 57K users.... the people who comment and post on Lemmy.... not the "open source" programmer community, but the social media community of Lemmy.