lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Cunha de Almeida <almeida...@gmail.com>
Subject Re: Using AND with MultiFieldQueryParser
Date Mon, 17 Nov 2008 23:31:09 GMT
On Mon, 17 Nov 2008 16:29:29 -0200
Rafael Cunha de Almeida <almeidaraf@gmail.com> wrote:

> On Mon, 17 Nov 2008 13:07:35 -0200
> Rafael Cunha de Almeida <almeidaraf@gmail.com> wrote:
> 
> > On Thu, 13 Nov 2008 12:12:17 -0500
> > Matthew Hall <mhall@informatics.jax.org> wrote:
> > 
> > > Which Analyzer have you assigned per field?
> > > 
> > > The PerFieldAnalyzerWrapper uses a default analyzer (the one you passed 
> > > during its construction), and then you assign specific analyzers to each 
> > > field that you want to have special treatment.
> > > 
> > > For example:
> > > 
> > >         PerFieldAnalyzerWrapper aWrapper = new PerFieldAnalyzerWrapper(
> > >                 new StandardAnalyzer());
> > >         aWrapper.addAnalyzer("data", new MGIAnalyzer());
> > >         aWrapper.addAnalyzer("sdata", new StemmedMGIAnalyzer());
> > > 
> > > Now, for the fields in question, have you assigned an Analyzer that 
> > > doesn't actually use stopwords? (there are several available in core)  
> > > Or are you perchance using a custom Analyzer that doesn't process stop 
> > > words?
> > >
> > > Could you possibly post your Initialization code for this?  If so I 
> > > think we could be of more help to you.
> > 
> > I wrote this method, which returns me the analyzer:
> >     static public Analyzer getAnalyzer()
> >     {
> >         PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(
> >             new KeywordAnalyzer());
> >         analyzer.addAnalyzer("placas", new UniqueTokensAnalyzer());
> >         analyzer.addAnalyzer("ano", new UniqueTokensAnalyzer());
> >         analyzer.addAnalyzer("no_reds", new NumberAnalyzer());
> > 
> >         analyzer.addAnalyzer("nomes", new SimpleBrazilianAnalyzer());
> >         analyzer.addAnalyzer("apelidos", new SimpleBrazilianAnalyzer());
> >         analyzer.addAnalyzer("historico", new SimpleBrazilianAnalyzer
> > ()); analyzer.addAnalyzer("modosAcaoCriminosa", new
> > SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("nomeMunicipio", new
> > SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("nomeBairro", new
> > SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("logradouro", new
> > SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("textoComplementar",
> > new SimpleBrazilianAnalyzer());
> > 
> >         return analyzer;
> >     }
> > 
> > SimpleBrazilianAnalyzer is my own analyzer that uses stopwords.
> > 
> > I pass that analyzer to MultiFieldQueryParser together with an array
> > with all the fields, ie. those fields and more.
> > 
> > When I do an AND search I'd like it to ignore stopwords. My best idea
> > so far is to make my own tokenizer and remove stopwords from the
> > search. How does that sound?
> 
> Another related problem, exact search cannot succeed if there's a
> stopword in between terms of the exact search. For instance:
> 	"Ready to ride"
> will never return any documents if "to" is a stopword. Shouldn't the
> stopwords be ignored as if they weren't even there?

Actually, the exact search issue is due to what I believe to be a bug
on BrazilianAnalyzer. I'm going to send the patch to java-dev@ right
now. But the problem with the AND search still holds.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message