LanguageTools Unknown Word Lemmatizer:
Output Syntax Description

Java-Based Results

The Java API provides a convenient way to get and manage the results of a query. Have a look at the IAnswer interface javadoc to read more information on how to use it within your code

All Java API answers can be transformed to string-based answers, using methods of the class APIConversions

String-Based Results

If there is no result the API function returns null.

If there are more than one result, each single result is delivered within a different string element of the resulting array of strings.

Overall result format (concatenation of all strings in the array of strings):

result       ::=  {citation EOL feature-set 
                  {"," EOL feature-set}} EOL.
feature-set  ::=  {feature-pair}.
feature-pair ::=  "(" attribute value ")".
citation     ::=  string.
attribute    ::=  string.
value        ::=  string.

The result of a query consists of a sequence of citation forms. Each single citation form is followed by one or more combinations of features.

The lemmatizer only considers "Cat" and "Flach" features.

The syntax is the same for lexicalized entries and for unknown analyzed entries. Distinguish between lexicalized results and unknown word analysis results using two different API function calls.


Here is part of the output of the integration test programs:

Lexicalized Word Function Call
query   -> sang
result  -> sang
             (Cat N)
           singen
             (Cat V)

query   -> sang   Filter: (Cat V)
result  -> singen
              (Cat V)

query   -> saenger
result  -> sänger
              (Cat N)(Flach auml)

Unknown Word Function Call
query   -> aufsinken
result  -> aufsinken
               (Cat V)

query   -> aufgesunken
result  -> aufsinken
               (Cat V)

query   -> skandalgeschüttelten
result  -> skandalgeschüttelt
               (Cat A)
           
query   -> abbausicheres
result  -> abbausicher
               (Cat A)

Feature Elements

There are only two feature attributes in the lemmatizer: Cat and Flach. "Cat" refers to Category, whereas "Flach" is used to tag forms which -according to the dictionary - are non-existent. These forms are nevertheless recognized because they correspond to valid forms, which result when data is entered without language-specific keyboards. For example Kaese is the "Flach"-attributed version of the German word Käse. These are non-existent forms, nevertheless recognized by the Unknown Word Lemmatizer, in order to tolerate input entered without a language-specific keyboard.



Attribute Values Meaning
Cat N Noun
  A Adjective
  V Verb
  Art Article
  Pron Pronoun
  Adv Adverb
  Prep Preposition
  Conjunct Conjunction
  Interj Interjection
  Number Number
  NCF Neoclassical form
  Letter Letter
Flach auml Same meaning as HTML entities
  ouml  
  uuml  
  agrave  
  ograve  
  ugrave  
  aacute  
  oacute  
  uacute  
  acirc  
  ocirc  
  ucirc  
  ccedil