Programmer Humor

33268 readers

1066 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

Posts must be relevant to programming, programmers, or computer science.
No NSFW content.
Jokes must be in good taste. No hate speech, bigotry, etc.

founded 5 years ago

MODERATORS

AgreeableLandscape@lemmy.ml

cat_programmer@lemmy.ml

358

DOGE employee (lemmy.world)

submitted 14 hours ago by SwordInStone@lemmy.world to c/programmerhumor@lemmy.ml

56 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] tetris11@lemmy.ml 10 points 6 hours ago* (last edited 6 hours ago) (2 children)

I have to admit, PDF parsing being such a hot and profitable topic in computer science was really something I never saw coming.

PDFs? The things you can select text from? And when not, there's decent OCR? And when not, you just ask the person to send you an email or a word doc?

It sounds like LLMs are looking for a new unpolluted source of historical data that they can learn from, and this source exists in the form of old scanned-in paper documents. That's the only reason I can fathom as to why this is such a big thing now.

[–] chicken@lemmy.dbzer0.com 4 points 5 hours ago

Every time I try to convert a PDF to epub or something, or OCR one that doesn't actually have selectable text, it turns out shit. I assume the real reason people would want to get LLMs involved is that there is actually a lot of ambiguity in what a correct conversion would be, and there are a lot of PDFs out there.

[–] sudo@programming.dev 3 points 6 hours ago

Training the most insane AI model on classified federal documents.