abbyy in Medak, Sekulic & Mertens 2014
PDF, used because of its purported superiority, but it is far less popular.
The export to PDF can be done again with a number of tools. In our case we'll complete the optical
character recognition and PDF export in gscan2pdf. Again, the proprietary Abbyy FineReader will
produce a bit smaller PDFs.
If you prefer to use an e-book format that works better with e-book readers, obviously you will have
to remove some of the elements that appear in the book - headers, footers, footnotes and pagination.
guage. 'Start OCR'. Once the OCR is
finished, export the graphic files and the OCR text to PDF by selecting 'Save as'.
However, given that sometimes the proprietary solutions produce better results, these tasks can also
be done, for instance, on the Abbyy FineReader running on a Windows operating system running
inside the Virtual Box. The prerequisites are that you have both Windows and Abbyy FineReader
you can install in the Virtual Box. If using Virtual Box, once you've got both installed, you need to
designate a shared folder in your Virtual Box and place the .tiff files there. You can now open them
from the Abbyy FineReader running in the Virtual Box, OCR them and export them into a PDF.
To use Abbyy FineReader transfer the output files in your 'out' out folder to the shared folder of the
VirtualBox. Then start the VirtualBox, start Windows image and in Windows start Abbyy
FineReader. Open the files and let the Abbyy FineReader read the files. Once it's done, output the
result into PDF.
VI. CATALOGING AND SHARING THE E-BOOK
Your road from a book on paper to an e-book is complete. If you want to maintain your library you
can use Calibre, a free software tool for
p down menu 'Tools',
select the Tesseract engine and your language, start the process
- once OCR is finished and to output to a PDF, go under 'File' and select 'Save', edit the
metadata and select the format, save
If using non-free software:
2) open Abbyy FineReader in VirtualBox (note: only Abby FineReader 10 installs and works with some limitations - under GNU/Linux)
- transfer files in the 'out' folder to the folder shared with the VirtualBox
- point it to the readied .tiff files and it will comple
Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.