|
Automatic Corpus-based Acquisition of Binary Terms |
|
|
|
|
ACABIT is under GPL.
ACABIT is a terminology extraction program which takes as input a linguistic annotated corpus and proposes as output a list of multi-word term (MWT) candidates ranked from the most representative of the corpus to the least using loglike score. For each MWT candidate, a XML structure is provided which gathers all the base structures and the variations encountered.
ACABIT uses the following programs :
-
Brill's POS tagger for French ATILF
-
French lemmatizater FLEMM (WARNING : the output data of FLEMM has been modified. You need to use FLEMM-v2.0 (1999))
-
-
Brill's POS BRILL
-
Lemmatiser : lexical database CELEX
Loading
Old versions
To understand ACABIT, please read some of my publications, for example :
[Daille, B. 2003b]. B. DAILLE, "Conceptual structuring through term variations". In F. Bond, A. Korhonen, D. MacCarthy and A. Villacicencio (eds.), Proceedings ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 9-16, 2003. Version PDF. |
|
Last Updated ( jeudi, 11 mars 2010 )
|