From 4c4269a4c7d788c7b73f60423aaf01cda72d4f8b Mon Sep 17 00:00:00 2001 From: Hunter Sezen Date: Tue, 19 Dec 2017 21:43:25 +0000 Subject: libraries/libexttextcat: Updated for version 3.4.5. Signed-off-by: David Spencer --- libraries/libexttextcat/README | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'libraries/libexttextcat/README') diff --git a/libraries/libexttextcat/README b/libraries/libexttextcat/README index 3b9743c04a..9332783b6e 100644 --- a/libraries/libexttextcat/README +++ b/libraries/libexttextcat/README @@ -3,7 +3,7 @@ classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy. - + The central idea of the Cavnar & Trenkle technique is to calculate a "fingerprint" of a document with an unknown category, and compare this with the fingerprints of a number of documents of which the categories @@ -12,7 +12,7 @@ classification. A fingerprint is a list of the most frequent n-grams occurring in a document, ordered by frequency. Fingerprints are compared with a simple out-of-place metric. See the article for more details. - + Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses. It was developed -- cgit v1.2.3