I wrote:
> It looks like StopAnalyzer tokenizes by letter, and doesn't handle
> apostrophes. So, the input "I don't know" produces these tokens:
>
> don
> t
> know
>
> Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't'
is a stopword, so the tokens are:
don
know
Phew, that's much more useful.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|