summaryrefslogtreecommitdiff
path: root/graphics/ocropus/ocroscript.1
diff options
context:
space:
mode:
Diffstat (limited to 'graphics/ocropus/ocroscript.1')
-rw-r--r--graphics/ocropus/ocroscript.143
1 files changed, 43 insertions, 0 deletions
diff --git a/graphics/ocropus/ocroscript.1 b/graphics/ocropus/ocroscript.1
new file mode 100644
index 0000000000..d8087203f7
--- /dev/null
+++ b/graphics/ocropus/ocroscript.1
@@ -0,0 +1,43 @@
+.TH ocroscript 1 "June 06, 2008"
+.SH NAME
+ocropus \- command line OCR tool
+.SH SYNOPSIS
+.B ocroscript
+.RI "<script> <arguments>"
+.SH DESCRIPTION
+You can see a list of all available commands by looking in the $OCROSCRIPTS
+(/usr/share/ocropus/scripts/ by default) path.
+.PP
+The \(oqrecognize\(cq script uses tesseract for recognition and sends the html-based hOCR
+ouput to stdout. Tesseract is probably the most mature text recognizer within
+OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but
+combined with OCRopus, it makes for a pretty good OCR system:
+.RS
+$ ocroscript recognize page.png > page.html
+.RE
+.PP
+Here is a brief summary of the remaining command line commands available.
+You will need to look at the script to see what the command line arguments are:
+.TP
+degrade.lua
+Simple document image degradation
+.TP
+hocr-to-text.lua
+Convert hOCR output to plain text.
+.TP
+line-clean.lua
+Given a line image, remove marginal noise and fix some other problems.
+.TP
+sauvola.lua
+Perform Sauvola thresholding.
+.SH SEE ALSO
+.BR tesseract (1),
+.br
+.PP
+.UR http://code.google.com/p/ocropus/w/list
+.UE
+.SH AUTHOR
+ocroscript was written by Thomas Breuel.
+.PP
+This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>,
+for the Debian project (but may be used by others).