lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Migrating SnowballAnalyzer to 4.1
Date Fri, 15 Mar 2013 15:29:47 GMT
2013/2/28 Steve Rowe <sarowe@gmail.com>:

> EnglishAnalyzer has used PorterStemmer instead of the English Snowball stemmer since
it was created in 2010 as part of LUCENE-2055[2].  I think this is an oversight: EnglishAnalyzer
should incorporate the best English stemmer we've got, and Martin Porter says the Porter2
stemmer is better[1].  Robert Muir (who wrote EnglishAnalyzer), if you're reading, what do
you think?

This was intentional actually. The default was a tradeoff of
"benefits" (which affect less than 5% of english vocabulary, if you
read around the snowball site), versus a much more significant
performance difference as a "default".

For example when i did tests of indexing both short and long texts

http://find.searchhub.org/document/c1d3301b71dab5ca#46a8351089a98aec

Thats overall indexing speed, not just text analysis.

It might be that this guy is faster these days (we've done some
improvements) too.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message