> But basically consider why this must be so, especially when
> stemming. Languages are so variable that you'd get wildly
> different (and inappropriate) results if you tried to analyze them
> with the same analyzer. Especially when you get different
> language encodings in the document.
Well... technically encoding is out of the scope of Lucene since we're passing
in a Reader.
I have to say though, analysing with the most naive analyser possible (the
default one with no stop words and no stemming) works well enough.
Language detection isn't at a point where it's reliable enough to use to
determine which analyser to use automatically.
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|