lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject RE: Multiple languages in same index?
Date Wed, 29 Jan 2003 20:01:24 GMT
I would use a prefix for each term. I.e. term T from analyzer A will be:  
<A>T and not just T (you will override the method that constructs the
tokens). I do not know how does Lucene implement fields, but this solution
could be faster.

It could solve the problem, if the analyzers process independent sets 
(i.e. German v. English or so...)

Just a thought...

-g-

On Wed, 29 Jan 2003, John Cwikla wrote:

> 
> 
> Be very careful with the multiple index approach, especially if
> you are trying to keep everything on the same machine in the
> same process, since 10 languages means a 10x more file handles
> opened in lucene...You can easily get up to the tens of thousands
> of file handles opened if you have lots of fields.
> 
> cwikla
> 
> -----Original Message-----
> From: Sale, Doug [mailto:dsale@us.britannica.com]
> Sent: Wednesday, January 29, 2003 7:40 AM
> To: 'Lucene Developers List'
> Subject: RE: Multiple languages in same index?
> 
> 
> randy,
> 
> you could use different analyzers over the same index, both indexing and
> searching.  however, your search results will be bunk (that's bad).
> 
> you would be better off maintaining separate indexes for each language
> (analyzer).
> 
> it might be possible to use 1 index, provided a field was added to each
> entry that defined the analyzer used on it.  you would then search first
> over the entire index for entries whose analyzer matched the one you are
> going to use on the input query (and then do your "regular" search over that
> subset).  i.e., it's a pain, better to do it in multiple indexes.
> 
> -doug  
> 
> > -----Original Message-----
> > From: Randy Darling [mailto:rdarling@imanage.com]
> > Sent: Tuesday, January 28, 2003 4:18 PM
> > To: lucene-dev@jakarta.apache.org
> > Subject: Multiple languages in same index?
> > 
> > 
> > 
> > Is it ok to index documents that have Chinese, German and English
> > in the same index?  From what I can tell I just need to use a 
> > different
> > analyzer when I create an IndexWriter.  But I do not see a way to
> > search with an analyzer for a specific language.
> > 
> > Or do I need to create a separate index for each language?
> > 
> > 
> > Thanks,
> > Randy
> > 
> > 
> > --
> > To unsubscribe, e-mail:   
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail: 
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message