-
Notifications
You must be signed in to change notification settings - Fork 9.9k
FAQ
A collection of frequently asked questions and the answers, or pointers to them for Tesseract 4.0.0.
For the older version of the FAQ pertaining to Tesseract 2.0x, 3.0x and 4.00.00alpha, please see FAQ Old.
If you have a question which is not answered by the FAQ, Wiki pages and Issues, please search in the users mailing-list/forum before posting it there.
If you think you found a bug in Tesseract, please search existing issues. If you find an existing similar issue, please add to it, otherwise create a new issue.
Read the CONTRIBUTING guide before you report an issue in GitHub or ask a question in the forum.
(Please note, this page is currently being updated for 4.0.0).
See Tesseract Wiki Home page for details.
See Tesseract man page for the list of languages and scripts supported by Tesseract4.0.0.
See the Tesseract Wiki Data Files page for information regarding the language models available for Tesseract 4.0.0.
You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the input image you are giving Tesseract.
Tesseract can be trained to recognize other languages or finetune existing language models. See Tesseract Wiki Training Tesseract 4.00 page for information on training the LSTM engine.
Please note that currently LSTM training is only supported using synthetic images created using a UTF-8 training text and unicode fonts to render the text.
Try searching the forum: http://groups.google.com/group/tesseract-ocr as well as open and closed issues on GitHub: https://github.com/tesseract-ocr/tesseract/issues, as your question may have come up before even if it is not listed here.
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.