OCR: Found 2 articles

Shell script for ADF scanner Fujitsu SP-1120

Posted by Eric Scheibler at July 3, 2021

Recently I bought a Fujitsu SP-1120 to replace my rather slow and old flatbed scanner. The Fujitsu is an an automatic document feeder (ADF) with duplex support. It’s faster, scans front and back pages in one go and produces a much better image quality for ocr.

This article describes the installation under Debian Linux and provides a simple scan-to-pdf shell script.

Continue reading ›

Script to extract text from images and scanned PDF files

Posted by Eric Scheibler at April 13, 2015

For the friends of the text console I’ve created a small shell script, which extracts text from images and scanned PDF files. You can specify as many input files as you want. The results are merged into a single text file. You can open it in your favorite text editor or pipe it to stdout. The program Tesseract is used for the text recognition.

Continue reading ›