lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Hlavac <>
Subject JLemmaGen project
Date Wed, 23 Oct 2013 15:17:32 GMT

I rewrote lemmatizer project LemmaGen ( to java. Originally it's
written in C#.
Lemmagen project uses rules to lemmatize word. Algorithm is described here:

Project is writtten under GPLv3. Sources are located on bitbucket server:

There is also Lemmagen4j project which use more memory and without prebuilded trees.

I obtained also licenced dictionaries to build rules tree for 15 languages. Dictionaries are
licenced, but prebuilded trees don't.
But you can also build your own dictionary.

Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.

Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian

thanks, miso

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message