lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Murzaku <murz...@yahoo.com>
Subject analyzers using snowball
Date Sat, 21 Sep 2002 15:15:38 GMT
Since from time to time we have these questions/discussions about
whether Lucene supports specific natural languages, I adapted a set of
analyzers and filters to use the Snowball
(http://snowball.tartarus.org) generated Java stemmers. This could be a
good start for anybody needing to get into more detail in a particular
language (like the existing Russian and German analyzers). It uses the
StandardTokenizer which works fine for the other languages (except
Russian).

The whole package is located at http://download.lissus.com/snowball.zip
and it is about 2.3MB. The reason for this size is that it also
contains all the test dictionaries for the 12 languages supported.
These languages are: Danish, Dutch, English (Porter2), Finnish, French,
German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish.
Finnish has some minor problems and I wasn't able to test properly
Russian since I am not familiar with character codesets. But I wouldn't
bother with Russian (or German) since it is already contained in the
Lucene package. As for Finnish, I am already communicating with the
Snowball team and hopefully it will work in Java as well as in the
other environments.

Best regards,

Alex


=====
__________________________________
alex@lissus.com -- http://www.lissus.com

__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message