Self-hosted PDF manager?
from someacnt@sh.itjust.works to selfhosted@lemmy.world on 20 Apr 06:10
https://sh.itjust.works/post/36428539

I have bunch of textbooks, and a lot of lecture notes and notes from colleagues, all in PDF format. What is a good way to classify, manage, store, and read these PDF files? I am trying calibre-web, but it seems difficult to find applications to connect to it.

#selfhosted

threaded - newest

starkzarn@infosec.pub on 20 Apr 06:19 next collapse

Paperless-ngx! github.com/paperless-ngx/paperless-ngx

Feddinat0r@feddit.org on 20 Apr 07:41 next collapse

I second this. Using this for about half an year as my full document store, letters, anything.

Search is great, lovin it

b3an@lemmy.world on 20 Apr 11:43 next collapse

I third this! I saw title and came to say.

It’s actively being developed still, I get emails like once every 1–3 weeks, sometimes more. Sometimes less.

I use docker desktop for this. I also lowkey learned how to set up a multi-database for this at one point, but kinda stopped after I got it working. More to see if I could.

I also tried bare metal building this, but had shit luck. It’s been a couple years though. Docker just makes it easy as hell.

I still keep all the originals separate just in case, and the tool can help you make multiple copies too (like PDF-A). I’ve never needed to go back and use those though, as Paperless just works so well once you get the hang of it and how you want your data stored.

I picked a structure that kind of lets me find stuff easily even if the tool is not running (like just by folder structures).

I’ve yet to make this online available for obvious reasons. But it would be nice to be able to pull up pretty much any document you need, any time.

Any suggestions on safe web access quickly from a phone might be helpful (WireGuard maybe?) if you have them.

walden@sub.wetshaving.social on 20 Apr 11:56 next collapse

For remote access, wireguard is great. You can access stuff via their internal addresses.

nickiam2@aussie.zone on 20 Apr 16:29 collapse

Tailscale is how I access my server. I’ve got a domain name that points to the internal tailscale IP address, but that’s not really necessary

non_burglar@lemmy.world on 20 Apr 14:24 collapse

Paperless-ngx is great, but it is particularly bad at handling PDF documents. Roughly half my documents just won’t import.

github.com/paperless-ngx/paperless-ngx/…/3933

reddit.com/…/paperlessngx_not_all_pdf_files_can_b…

github.com/paperless-ngx/paperless-ngx/…/2187

[deleted] on 20 Apr 06:35 next collapse

.

N0x0n@lemmy.ml on 20 Apr 07:33 next collapse

Not self-hosted so I doesn’t really answer your question… However, if you’re still a student consider the switch to Zotero.

Things you can self-host though, to make your books available everywhere, is some webdav sever to link your books directly to zotero and access them on every device.

If you’re serious about book reading and study, nothing beats Zotero !

Engywuck@lemm.ee on 20 Apr 09:01 collapse

You can also actually not self-hosting the database but have your documents hosted on some WebDAV server you control.

sonalder@lemmy.ml on 20 Apr 09:54 next collapse

StirlingPDF ? Website : www.stirlingpdf.com GitHub : github.com/Stirling-Tools/Stirling-PDF

mcchots@sh.itjust.works on 20 Apr 13:03 collapse

StirlingPDF is great, but more of a PDF editor.

OP wants something to store and manage his PDF’s.

sonalder@lemmy.ml on 20 Apr 16:09 collapse

You’re right I was certain it was doing both editing and managment but my memory played me

m@social.tthi.as on 20 Apr 08:48 next collapse

As a card-carrying librarian, I recommend using Zotero as a client with a WebDAV backend (I use Nextcloud).

If you’re studying or writing anything in which you need to cite your sources, Zotero is excellent and has integrations with many word processors. I’m pretty sure it can output your references as BibTeX if you’re in one of the disciplines that uses LaTeX.

mobotsar@sh.itjust.works on 20 Apr 12:25 collapse

I’m not even a librarian but pshh, I still got a card.

db_geek@norden.social on 20 Apr 06:19 next collapse

@someacnt

Maybe paperless-ngx can be a solution for this.
https://github.com/paperless-ngx/paperless-ngx

meyotch@slrpnk.net on 20 Apr 12:53 collapse

Seconded, for the second time!

Paperless is very easy to install and maintain. I use it for both scientific pdfs and random receipts. It’s easy to keep them separate

maxprime@lemmy.ml on 20 Apr 13:02 next collapse

All great recommendations here. But I’ve heard good things about PdfDing. I haven’t used it myself but have followed development since the developer is quite active.

weker01@sh.itjust.works on 20 Apr 13:08 next collapse

What’s wrong with just folders and file names?

raltoid@lemmy.world on 20 Apr 13:57 next collapse

Benefit of data managers: Tags for easier searching and grouping, grouping that is folder agnostic, easily choosing a thumbnail, and in text focused ones you can usually search the content of the files from one location and easily look through the results for the correct one, etc.

Appoxo@lemmy.dbzer0.com on 20 Apr 23:42 collapse

The same reason why immich fucking rocks.

Search pictures by “ocean” or “moon” and get shown related pictures.

If those doc managemnt solutions can also do OCR it’s a doublw win.

weker01@sh.itjust.works on 21 Apr 00:05 collapse

Hmmm. Maybe I should try that then. Never actually understood why people like these managers as I was always satisfied with the directory tree for organization.

Well maybe besides music. There beets fucking rocks. But in the end I use it also only to sort music into a directory structure.

Appoxo@lemmy.dbzer0.com on 21 Apr 08:04 collapse

I configured my immich to sort the photos by YYYY/MM/#original-photo-name#_assetnumber

Should I leave immich, I could still recover or view my files

asceticism@lemmy.world on 20 Apr 13:40 next collapse

Not quite the correct application but linkwarden would work. Stores all your links but also backs everything up via html, plain text, and pdf. You can categorize content and tag content. Then there are filter and search tools.

You can just give it PDFs and it will import them over as well. Only saves them as a pdf but still would work.

I’m guessing this is not the best approach but wanted to give you options.

thejevans@lemmy.ml on 20 Apr 14:46 next collapse

Not self hosted necessarily, but TagStudio is an interesting project worth keeping an eye on docs.tagstud.io

philpo@feddit.org on 20 Apr 15:18 next collapse

Contrary to the others here,while I love Paperless,using it for textbooks and notes only worked “somewhat” for me - it becomes quite clunky after a while.

Personally I would rather go with Calibre if I were you if you have more textbooks than notes. Even for notes, they can be attached as well and better organised than Paperless.

(And don’t get me wrong paperless is awesome and I use it heavily)

someacnt@sh.itjust.works on 21 Apr 13:54 collapse

Thanks, I am trying both paperless and calkbre and see which works better for which tasks.

grimer@lemmy.world on 20 Apr 15:57 next collapse

Not sure what your preferred platform is but I’ve had great success connecting to my Calibre-web site with Yomu on iOS.

stegosaur5491@lemm.ee on 20 Apr 16:43 next collapse

Maybe have a look at pdfding. It isn’t to bloated and pretty simple, but has all the features that are essential for me. I like it.

xgranade@lemmy.blahaj.zone on 20 Apr 21:09 next collapse

I’m a big fan of Docspell, there’s lots of ways to import docs in (watching a folder, watching an e-mail account, etc), and it plays really well with my IdP instance over OIDC.

Sunny@slrpnk.net on 20 Apr 22:35 collapse

I believe this new project should hit your need quite well!

Papra is quite new in the selfhosted sphere but a welcome addition. Yet to test it myself but it sounds and looks very promising > github.com/papra-hq/papra