lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <karl.wet...@gmail.com>
Subject Re: Multi language indexing
Date Mon, 07 May 2007 11:05:24 GMT

7 maj 2007 kl. 12.16 skrev bhecht:

> My question regarding "the way to go", was if it is a good solution  
> to index
> a content of a table, using more than 1 analyzer, determining the  
> analyzer
> by the country value of each record.

I'm not sure what you mean, but I'll try.

Do you ask if it makes sense to stem text based on the language of  
the text and put in the same field no matter what language it is?

For the record, it usually makes very little sense to search in text  
stemmed for one language with a query stemmed for another language.  
This is what you will do if you store the stemmed text, no matter the  
language, in the same field. You could add another field called  
"language_iso" and add a boolean clause, but that would just be  
overkill and will increase the response time.

In essence, it depends on your needs. For instance, are users  
supposed to find documents written in other languages than the  
language specified? You want to limit searches to a content language?

My guess is that you probably want to index unstemmed in  
"unstemmed_text" and stemmed in a language specific field  
"stemmed_text_[language iso]", or so, querying the unstemmed field  
and the user language specific when searching, boosting the stemmed  
field.

I hope this helps.

-- 
karl


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message