lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Roberts <m...@andy-roberts.net>
Subject Re: Multi-analyzer ?
Date Tue, 12 Apr 2005 07:31:01 GMT
On Tuesday 12 Apr 2005 00:53, Eric Chow wrote:
> But how about one document contains more than two different languages ??
>
>
> Eric

If you're indexing many documents which contain multiple languages then it's 
probably just better to use a SimpleAnalyser, rather than one that does any 
language specific stemming or removal of stoplist words.

If there are documents where one language is clearly more dominant than the 
other, then it would probably be ok to use an Analyzer for that language and 
hope it doesn't effect the indexing of the other language too much. However, 
it's clear that you can't really accomodate multi-language documents. It 
would be much easier to ensure all docs were in a single language before 
indexing.

Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message