Java Text Categorizing Library
What's that?
The Java Text Categorizing Library (JTCL) is a pure java 1.5 implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy". It's distributed under the LGPL and can also be used in order to categorize text into arbitrary topics by computing appropiate fingerprints which represent the categories.
The text analysis algorithm is based on the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization".
The library was developed at Knallgrau New Media Solutions and is currently in use at tagthe.net which is a webservice that extracts meta information from a given resource. The JTCL is used in order to determine the resource's language and it's planned to be used for topic determination either in the near future.
Release 1.0
Supported languages
- albanian
- danish
- dutch
- english
- finnish
- french
- german
- hungarian
- italian
- norwegian
- polish
- slovakian
- slovenian
- spanish
- swedish