org.knallgrau.utils.textcat
Class TextCategorizer

java.lang.Object
  extended by org.knallgrau.utils.textcat.TextCategorizer

public class TextCategorizer
extends java.lang.Object


Constructor Summary
TextCategorizer()
           
TextCategorizer(java.lang.String confFile)
          creates a new TextCategorizer with the given configuration file. the configuration file maps paths to FingerPrint files to categories which are used to categorize the texts passed to the TextCategorizer.
 
Method Summary
 java.lang.String categorize(java.lang.String text)
          categorizes the text passed to it
 java.lang.String categorize(java.lang.String text, int limit)
          categorizes only a certain amount of characters in the text. recommended when categorizing large texts in order to increase performance.
 java.util.Map<java.lang.String,java.lang.Integer> getCategoryDistances(java.lang.String text)
          categorizes a text but returns a map containing all categories and their distances to the text.
static void main(java.lang.String[] args)
          reads from stdin til EOF is read. prints the determined category of the input and terminates afterwards.
 void setConfFile(java.lang.String confFile)
          sets the configuration file path.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextCategorizer

public TextCategorizer()

TextCategorizer

public TextCategorizer(java.lang.String confFile)
creates a new TextCategorizer with the given configuration file. the configuration file maps paths to FingerPrint files to categories which are used to categorize the texts passed to the TextCategorizer.

Parameters:
confFile - the path to the configuration file
Method Detail

setConfFile

public void setConfFile(java.lang.String confFile)
sets the configuration file path.

Parameters:
confFile - the path to the configuration file

categorize

public java.lang.String categorize(java.lang.String text)
categorizes the text passed to it

Parameters:
text - text to be categorized
Returns:
the category name given in the configuration file

categorize

public java.lang.String categorize(java.lang.String text,
                                   int limit)
categorizes only a certain amount of characters in the text. recommended when categorizing large texts in order to increase performance.

Parameters:
text - text to be analysed
limit - number of characters to be analysed
Returns:
the category name given in the configuration file

getCategoryDistances

public java.util.Map<java.lang.String,java.lang.Integer> getCategoryDistances(java.lang.String text)
categorizes a text but returns a map containing all categories and their distances to the text.

Parameters:
text - text to be categorized
Returns:
HashMap with categories as keys and distances as values

main

public static void main(java.lang.String[] args)
reads from stdin til EOF is read. prints the determined category of the input and terminates afterwards.

Parameters:
args -