lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: How international languages are supported in Lucene
Date Thu, 05 Jun 2008 23:36:11 GMT
> But basically consider why this must be so, especially when
> stemming. Languages are so variable that you'd get wildly
> different (and inappropriate) results if you tried to analyze them
> with the same analyzer. Especially when you get different
> language encodings in the document.

Well... technically encoding is out of the scope of Lucene since we're passing 
in a Reader.

I have to say though, analysing with the most naive analyser possible (the 
default one with no stop words and no stemming) works well enough.

Language detection isn't at a point where it's reliable enough to use to 
determine which analyser to use automatically.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message