lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com.au>
Subject Re: Use two Analyzers in Lucene
Date Sun, 02 Apr 2006 23:33:47 GMT
Kostas V. wrote:
> I have the Analyzers for both languages (they do stemming as well) but I
> don't know how to use them together. I imagine that I have to do two passes
> for each paper  ?? or this is not correct?
> The following line is how I use my English Analyzer
> 
> IndexWriter writer = new IndexWriter(indexPath,new PorterStemAnalyzer() ,
> true);
> 
> And this about the Greek
> 
> IndexWriter writer = new IndexWriter(indexPath,new GreekAnalyzer() , true);
> 
> Is it possible?
> And when I make the search, how the program can use both Analyzers as well?
> They told me to make a mixed Analyzer but I don't know if this is possible.

The general idea would be to make an analyser which chooses which 
analyser to pass the text to.  In general this would be rather 
difficult, but in your particular situation, Greek and English use 
different alphabets so it may not be too hard.

Having a quick look at the GreekAnalyzer, it still uses the 
StandardTokenizer.  And it looks like the filters that are being used 
for this and the English analyser wouldn't interfere with each other 
either.  So you could probably make an analyser which performs both, 
something like this:

   public class CombinedAnalyser extends Analyzer {
     private GreekAnalyzer greek = new GreekAnalyzer();
     public TokenStream tokenStream(String fieldName, Reader reader) {
       // Filters greek
       TokenStream tokens = greek.tokenStream(fieldName, reader);

       // Filters english
       tokens = new StandardFilter(tokens);
       tokens = new LowerCaseFilter(tokens);
       tokens = new StopFilter(tokens);
       tokens = new PorterStemFilter(tokens);

       return tokens;
     }
   }

Another way to go about it would be to detect the greek fragments of the 
text up-front and pass those fragments through the greek analyser, and 
anything else through the other analyser.

Daniel


-- 
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message