lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer <evanchaste...@gmail.com>
Subject Re: MultiFieldQueryParser with default AND and stopfilter
Date Wed, 08 Jun 2011 15:01:06 GMT
Sorry, I made a mistake here:

> Unfortunately, the solution that Erick gave won't do the trick
> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> This still won't match documents where both 'the' and 'project' appear
> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
> desc: 'the open source search software from Apache')

Correction: this will actually match the example query ('the project'),
but this solution won't work if the search query is changed to: 'the
search project', since 'search' is not in the title field.

Br,
Elmer


On Wed, 2011-06-08 at 16:35 +0200, Elmer wrote:
> Thank you,
> 
> I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;)
> And that's where the problem comes in: different fields using different
> analyzers (some with, some without a stopfilter). For each term
> (tokenized by MFQP itself?), it applies the given analyzer on each
> field. If the analyzer returns no token (occurs on 'the' when using the
> PerFieldAnalyzerWrapper for the desc field), that field will not be
> included in the clause for that term. (see/re-read the example, maybe
> it's more clear what I mean now).
> 
> Unfortunately, the solution that Erick gave won't do the trick
> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> This still won't match documents where both 'the' and 'project' appear
> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
> desc: 'the open source search software from Apache')
> 
> I hope it's clear what I mean :) Otherwise, let me know!
> 
> BR,
> Elmer
> 
> 
> 
> On Wed, 2011-06-08 at 14:42 +0100, Ian Lea wrote:
> > Except that I think he has loads of other fields and wants to keep it simple.
> > 
> > But how about passing a PerFieldAnalyzerWrapper instance as the
> > analyzer to MFQP?  Worth a try.
> > 
> > 
> > --
> > Ian.
> > 
> > 
> > On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> > > Could you just construct a BooleanQuery with the
> > > terms against different fields instead of using MFQP?
> > > e.g.
> > >
> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> > >
> > > etc...? If your QueryParser was created with a
> > > PerFieldAnalyzerWrapper I think you might get what you
> > > want....
> > >
> > > Note, bad pseudo code there...
> > >
> > > Best
> > > Erick
> > >
> > > On Wed, Jun 8, 2011 at 4:52 AM, Elmer <evanchastelet@gmail.com> wrote:
> > >> Hi,
> > >>
> > >> I have a use case in which I use the MultiFieldQueryParser (MFQP) on
> > >> some fields that use and some fields that don't use a stopfilter. The
> > >> default operator of the MFQP is set to AND.
> > >> For example, if the search query is 'the project' (with 'the' included
> > >> in the stoplist) and the search fields are:
> > >>
> > >> title - not using a stopfilter,
> > >> desc - using a stopfilter,
> > >>
> > >> the parsed query becomes:
> > >>
> > >> '+(title:the) +(title:project desc:project)'.
> > >>
> > >> So, the problem is that docs that have the term 'the' only appearing in
> > >> their desc field are excluded from the results. So every query, with AND
> > >> as default operator, that has a stop word in it that only appears in
> > >> fields that use a stop filter will have this problem (or similar, if
> > >> there is at least one field X not using a stopfilter -> no match if
a
> > >> stopword from query doesn't appear in field X). Thus, in this example,
a
> > >> document with title: 'Lucene project' and desc: 'the open source search
> > >> software from Apache' will not be matched. In my opinion this is not the
> > >> expected behavior. What I'd like to see is that this doc is matched by
> > >> the given query. So, for each token in the query, that appears to be a
> > >> stopword in a field (i.e. some filter filters the token out), I want it
> > >> to be matched instead of not.
> > >>
> > >> Anyone who knows a way to deal with this? I would prefer to keep using
> > >> the MFQP, since I need to support multiple fields, querytime boosting
> > >> and lucene syntax. Or is there a disadvantage by doing this?
> > >>
> > >> Thanks in advance.
> > >>
> > >> BR,
> > >> Elmer van Chastelet
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message