lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Hlavac <hla...@hlavki.eu>
Subject JLemmaGen project
Date Wed, 23 Oct 2013 15:17:32 GMT
Hi,

I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally it's
written in C#.
Lemmagen project uses rules to lemmatize word. Algorithm is described here:
http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf

Project is writtten under GPLv3. Sources are located on bitbucket server:
https://bitbucket.org/hlavki/jlemmagen

There is also Lemmagen4j project which use more memory and without prebuilded trees.

I obtained also licenced dictionaries to build rules tree for 15 languages. Dictionaries are
licenced, but prebuilded trees don't.
But you can also build your own dictionary.

Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.

Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian

thanks, miso


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message