lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: AW: Best practices for multiple languages?
Date Wed, 19 Jan 2011 23:28:29 GMT
On Thu, Jan 20, 2011 at 9:08 AM, Paul Libbrecht <paul@hoplahup.net> wrote:
>>> Wouldn't it be better to prefer precise matches (a field that is
>>> analyzed with StandardAnalyzer for example) but also allow matches are
>>> stemmed.
>>
>> StandardAnalyzer isn't quite precise, is it?  StandardFilter does some
>> kind of English-centric alterations to things.
>
> from here:
> http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/analysis/standard/StandardTokenizer.html
>
> I can only conclude that it handles correctly the characters variety but does not stemming.

Doesn't StandardAnalyzer also run this?
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/analysis/standard/StandardFilter.html

This thing definitely performs English-specific filtering for "'s" and "'S".

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message