diff options
Diffstat (limited to 'graphics/ocropus/ocroscript.1')
-rw-r--r-- | graphics/ocropus/ocroscript.1 | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/graphics/ocropus/ocroscript.1 b/graphics/ocropus/ocroscript.1 new file mode 100644 index 0000000000..d8087203f7 --- /dev/null +++ b/graphics/ocropus/ocroscript.1 @@ -0,0 +1,43 @@ +.TH ocroscript 1 "June 06, 2008" +.SH NAME +ocropus \- command line OCR tool +.SH SYNOPSIS +.B ocroscript +.RI "<script> <arguments>" +.SH DESCRIPTION +You can see a list of all available commands by looking in the $OCROSCRIPTS +(/usr/share/ocropus/scripts/ by default) path. +.PP +The \(oqrecognize\(cq script uses tesseract for recognition and sends the html-based hOCR +ouput to stdout. Tesseract is probably the most mature text recognizer within +OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but +combined with OCRopus, it makes for a pretty good OCR system: +.RS +$ ocroscript recognize page.png > page.html +.RE +.PP +Here is a brief summary of the remaining command line commands available. +You will need to look at the script to see what the command line arguments are: +.TP +degrade.lua +Simple document image degradation +.TP +hocr-to-text.lua +Convert hOCR output to plain text. +.TP +line-clean.lua +Given a line image, remove marginal noise and fix some other problems. +.TP +sauvola.lua +Perform Sauvola thresholding. +.SH SEE ALSO +.BR tesseract (1), +.br +.PP +.UR http://code.google.com/p/ocropus/w/list +.UE +.SH AUTHOR +ocroscript was written by Thomas Breuel. +.PP +This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>, +for the Debian project (but may be used by others). |