How do you manage your (financial) documents?

FWIW in my personal case I favor a self-hosted approach and scan paper documents as they come in using the CamScanner Pro Android app, share the resulting file to the Syncthing app which uploads it to a specific folder on my NAS, which will then leverage the OCRmyPDF Python script to automatically OCR the PDF and move it to a year/month-based directory. I am then able to use the Ambar software to do full-text search on any document I have on my NAS. The quick scanning/OCR procedure makes it a no-brainer to scan the documents, and having full-text search makes it very easy to find documents when I need them (in a question of seconds). The software stack on the NAS is configured using docker-compose.

5 Likes

Thank you!

Now I have a new project thanks to you :slight_smile:

2 Likes

If you’re into self-hosting solutions, you might be interested in https://github.com/jonaswinkler/paperless-ng

OCR included. I don’t use it, but heard a lot of good about it.

1 Like

That was one of the many solutions I tried, but never managed to get it to work properly. These were my requirements, by the way:

  • Self-hosted;
  • Quick to import and to search PDFs;
  • Use tesseract OCR as that is the gold-standard of open-source OCR for me;
  • No state stored that cannot be rebuilt from the PDF themselves (I don’t want to own a separate datastore). Some solutions import PDFs and then you lose sight of them and everything must be done within the tool itself. No thanks.
  • I scan all documents and destroy the papers, DL the ebill bills etc.
  • Sort them in different folders (assurances, prévoyance, ménage, impôts etc.) on a cloud (I do not use the internal drive of the laptop so that they stay available from every device when not at home).
  • Every time I do a bit of admin work (1-2 times a month approx), I duplicate the folders on an other cloud and on a external harddrive.
1 Like

Coincidently, my Ambar server recently started having issues with expired SSL certificates and consuming some CPU every 5 minutes, so I started looking for a replacement. I have now replaced it with pagerless-ng (a much better maintained fork of the original paperless project) and I’m much happier with it, for the following reasons:

  • it uses much less resources when idle;
  • it still leverages the great OCRmyPDF/Tesseract projects;
  • it has AI technology to automatically recognize sender/document date/type of document;
  • it still maintains a tree of PDFs in your disk so you don’t lose access to them if the project stops being maintained for some reason;
  • there’s a ton of community support, with apps for Android/iOS and even command line interface(!)

In short, hopefully you haven’t yet gone down the Ambar route, since this will be more future-proof solution. Hope this helps!

1 Like

Simple: i have a “Finance” share on my Synology nas with nightly encrypted backup to the Synology Cloud.

Thanks for the heads-up! Fortunately no, I was on holidays and I just go back at my PC to geek, so i’ll give this NG fork a test.

1 Like

Hi @betterlatethannever
very good topic! I handle archiving for both work and my personal life (obviously). I apply the 10 year role for my private stuff as well (if it is in paper form I file in a binder, if not I archive everything on a NAS with backup)
However recently I came across this startup, based in Lausanne: Addmin - your intelligent digital filing cabinet
Haven’t tried it out (yet), but might be of interest to anyone if managing/filing documents is not your cup of cake :wink:

Are these hosted / servers solutions worth it for an individual user?

Can’t you just use your file manager’s built-in folders, tags and search functionality (Finder tags, Spotlight search)? Much less complexity and installation required, less likely to break with a software update or malfunction…?

3 Likes

Thank you for sharing this. Just gave paperless-ng a try and I have to say I’m really impressed by the OCR capabilities.

I’ll start now to digitalize all my documents and finally get rid of paper as much as possible.

2 Likes

Thanks for the paperless-ng recommendation, was rather easy to spin up the docker containers on my QNAP.

1 Like

$ ocrmypdf -l fra MyInputFile MyOutputFile

InputFile and OutputFile can have different or same name.
The -l option defines the language of the document to optimize the ocr work.

1 Like