Paperless office; document/image processing

36 readers

1 users here now

Everything related to maintaining a paperless office running on free software.

Discussions include image processing tools like GIMP, ImageMagick, unpaper, pdf2djvu, etc.

founded 9 months ago

MODERATORS

freedomPusher@sopuli.xyz

How to obtain the density (DPI / PPI) of a PGM file -- anyone know? ImageMagick does not cut it. (mander.xyz)

submitted 1 month ago* (last edited 1 month ago) by plantteacher@mander.xyz to c/paperless@sopuli.xyz

4 comments fedilink

Running this gives the geometry but not the density:

$ identify -verbose myfile.pgm | grep -iE 'geometry|pixel|dens|size|dimen|inch|unit'

There is also a “Pixels per second” attribute which means nothing to me. No density and not even a canvas/page dimension (which would make it possible to compute the density). The “Units” attribute on my source images are “undefined”.

Suggestions?

Safe enough for public webserver? (discuss.tchncs.de)

submitted 6 months ago* (last edited 6 months ago) by ulo@discuss.tchncs.de to c/paperless@sopuli.xyz

0 comments fedilink

I just discovered this software and like it very much.

Would you consider it safe enough to use it with my personal documents on a public webserver?

PDF renders radically different between Adobe Acrobat® vs. evince & okular (GhostScript-based) (mirror.ctan.org)

submitted 7 months ago* (last edited 7 months ago) by freedomPusher@sopuli.xyz to c/paperless@sopuli.xyz

0 comments fedilink

The linked doc is a PDF which looks very different in Adobe Acrobat than it does in evince and okular, which I believe are both based on the same GhostScript library.

So the question is, is there an alternative free PDF viewer that does not rely on the GhostScript library for rendering?

#AskFedi

[solved] TIFF → DjVu conversion produces bigger file from bilevel doc than color (sopuli.xyz)

submitted 9 months ago* (last edited 8 months ago) by freedomPusher@sopuli.xyz to c/paperless@sopuli.xyz

0 comments fedilink

I would like to get to the bottom of what I am doing wrong that leads to black and white documents having a bigger filesize than color.

My process for a color TIFF is like this:

① tiff2pdf ② ocrmypdf ③ pdf2djvu

Resulting color DjVu file is ~56k. When pdfimages -all runs on the intermediate PDF file, it shows CCITT (fax) is inside.

My process for a black and white TIFF is the same:

① tiff2pdf ② ocrmypdf ③ pdf2djvu

Resulting black and white DjVu file is ~145k (almost 3× the color size). When pdfimages -all runs on the intermediate PDF file, it shows a PNG file is inside. If I replace step ① with ImageMagick’s convert, the first PDF is 10mb, but in the end the resulting djvu file is still ~145k. And PNG is still inside the intermediate PDF.

I can get the bitonal (bilevel) image smaller by using cjb2 -clean, which goes straight from TIFF to DjVu, but then I can’t OCR it due to the lack of PDF intermediate version. And the size is still bigger than the color doc (~68k).

update

I think I found the problem, which would not be evident from what I posted. I was passing the --force-ocr option to ocrmypdf. I did that just to push through errors like “this doc is already OCRd”. But that option does much more than you would expect: it transcodes the doc. Looks like my fix is to pass --redo-ocr instead. It’s not yet obvious to me why --force-ocr impacted bilevel images more.

#askFedi