lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Ok to add method IndexWriter.addDocument( Analyzer, Document ) ?
Date Thu, 26 Jun 2003 21:53:58 GMT
Randy Darling wrote:
> 
> Would it be ok to add an extra addDocument method to
> IndexWriter that would take an analyzer in addition to
> the document?
> 
> I am going to be indexing documents for multiple languages
> and I would prefer to not have to reopen a writer for
> each document that we are going to index.
> 
> I took a look at the code and it looks pretty straight forward
> and it didn't look like it would break anything.

I had the same problem, but I came up with a workaround which might be 
helpful to you. I just wrote a facade analyzer, which selects 
appropriate language-specific analyzer just before I call addDocument. 
Something like:

	SwitchLangAnalyzer sla = new SwitchLangAnalyzer(new Analyzer[] 
{GermanAnalyzer, RussianAnalyzer, SwedishAnalyzer});
	IndexWriter iw = new IndexWriter(dir, sla, true);
	// add German doc
	sla.select(0);
	iw.addDocument(doc);
	// add Russian doc
	sla.select(1);
	iw.addDocument(doc);

..and so on...

You need to be extra careful though how you use such index afterwards, 
especially if you use stemming or stop words - I also store a "lang" 
field which I use to limit the search to documents only in a given 
language, and I use the same sub-analyzer for queries.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message