lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: StopAnalyzer and apostrophes
Date Fri, 07 Apr 2006 00:04:43 GMT

On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote:

> Marvin Humphrey wrote:
>> I wrote:
>>> It looks like StopAnalyzer tokenizes by letter, and doesn't  
>>> handle apostrophes.  So, the input "I don't know" produces these  
>>> tokens:
>>>
>>>     don
>>>     t
>>>     know
>>>
>>> Is that right?
>> It's not right.  StopAnalyzer does tokenize letter by letter, but  
>> 't' is a stopword, so the tokens are:
>>     don
>>     know
>> Phew, that's much more useful.
>
> Naturally.  But if you actually cared about apostrophes you would  
> be using StandardAnalyzer instead, right?

I'd be using PolyAnalyzer ;)

<http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Analysis/ 
PolyAnalyzer.html>

... unless I were trying to duplicate the behavior of StopAnalyzer  
for benchmarking purposes.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message