Languagetools Unknown Word Lemmatizer:
Output Syntax Description
Java-Based Results
The Java API provides a convenient way to get and manage the results of a query. Have a look at the IAnswer interface javadoc to read more information on how to use it within your code
All Java API answers can be transformed to string-based answers, using methods of the class APIConversions
String-Based Results
If there is no result the API function returns null.
If there are more than one result, each single result is delivered within a different string element of the resulting array of strings.
Overall result format (concatenation of all strings in the array of strings):
result ::= {citation EOL feature-set
{"," EOL feature-set}} EOL.
feature-set ::= {feature-pair}.
feature-pair ::= "(" attribute value ")".
citation ::= string.
attribute ::= string.
value ::= string.
The result of a query consists of a sequence of citation forms. Each single citation form is followed by one or more combinations of features.
The lemmatizer only considers "Cat" and "Flach" features.
The syntax is the same for lexicalized entries and for unknown analyzed entries. Distinguish between lexicalized results and unknown word analysis results using two different API function calls.
Here is part of the output of the integration test programs:
Lexicalized Word Function Call
query -> sang
result -> sang
(Cat N)
singen
(Cat V)
query -> sang Filter: (Cat V)
result -> singen
(Cat V)
query -> saenger
result -> sänger
(Cat N)(Flach auml)
Unknown Word Function Call
query -> aufsinken
result -> aufsinken
(Cat V)
query -> aufgesunken
result -> aufsinken
(Cat V)
query -> skandalgeschüttelten
result -> skandalgeschüttelt
(Cat A)
query -> abbausicheres
result -> abbausicher
(Cat A)
Feature Elements
There are only two feature attributes in the lemmatizer: Cat and Flach. "Cat" refers to Category, whereas "Flach" is used to tag forms which -according to the dictionary - are non-existent. These forms are nevertheless recognized because they correspond to valid forms, which result when data is entered without language-specific keyboards. For example Kaese is the "Flach"-attributed version of the German word Käse. These are non-existent forms, nevertheless recognized by the Unknown Word Lemmatizer, in order to tolerate input entered without a language-specific keyboard.
| Attribute | Values | Meaning |
| Cat | N | Noun |
| A | Adjective | |
| V | Verb | |
| Art | Article | |
| Pron | Pronoun | |
| Adv | Adverb | |
| Prep | Preposition | |
| Conjunct | Conjunction | |
| Interj | Interjection | |
| Number | Number | |
| NCF | Neoclassical form | |
| Letter | Letter | |
| Flach | auml | Same meaning as HTML entities |
| ouml | ||
| uuml | ||
| agrave | ||
| ograve | ||
| ugrave | ||
| aacute | ||
| oacute | ||
| uacute | ||
| acirc | ||
| ocirc | ||
| ucirc | ||
| ccedil | ||