Available Datasets

The following datasets are deliverable for all types of Analyzers, Generators and Analyzer/Generators currently available in Italian:

  • Evaluation license: all nouns, verbs, adjectives and adverbs starting with the letters 'a' to 'c', plus all other types of entries (for a total of about 13'000 entries).
  • Full license: all currently available Italian words (currently more than 50'000), with contraction elements analysis.

The following datasets are available for the Italian Wordformation Analyzer/Generator:

  • Evaluation license: all derivation level entries concerning all nouns, verbs, adjectives and adverbs starting with the letters 'a' to 'c', plus all other types of entries (a total of about 10'000 relations).
  • Full license: all derivation level entries for all entries (currently 40'000 relations).

Language-Specific Features

Here are some Italian-specific features that need to be considered by your client application, in order to make the best use of our data analyzers.

Attribute Meaning
Contraction Contractions of elements, usually clitics


The Italian version is able to analyze and recognize cliticized word forms, like "dammelo", "dimmelo","vattene", etc., where forms of more lexemes are combined into one unique graphic word. This is a very useful feature when analyzing text, because clitics are used very often in Italian.

Here is an example for the Lemmatizer:

query   -> vattene
result  -> andare
             (Cat V)(Contraction ti/Pron+ne/Pron+V)

Here is an example for the Analyzer:

query   -> spiegatemela
result  -> spiegare
            (Cat V)(Aux avere)(Mod Imp)(Pers 2nd)(Num PL)
            (Contraction mi/Pron+la/Pron+V)

The Contraction Feature

The contraction feature is used to specify contraction elements included in the answer. The above example shows the results for the queries "vattene" and "spiegatemela". The single entities within the contraction feature are separated by the character '+'. An entity is described uniquely by its category (example: V) if it is an "open" entity, i.e. all entries of the same category could potentially be applied to that entity (following specific restrictions). On the other hand, an entity is specified by the pair "citation form" - "/" - "category", if it describes an element from a finite set of possibilities (example: mi/Pron).

Here is a formal syntax description:

In the text representation a contraction feature is represented by an attribute-value pair, where the attribute is "Contraction" and the value is the entity

contraction-feature ::= "(Contraction" value ")".

The value of the pair is composed by a sequence of entities, separated by "+"

value               ::= entity {"+" entity}+.
entity              ::= [citation-form "/"] category.

Problem Feature

Features of the type (Problem xy) are related to the entry specification in our database. They can be ignored.