Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of evanchastelet@gmail.com
 designates 209.85.161.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=subject:from:to:in-reply-to:references:content-type:date:message-id
         :mime-version:x-mailer:content-transfer-encoding;
        b=kSlcZqlOFfzwtX2lPMkWtM35Yo7jMyyfNln7INXkJMEAHxb8kAI1+D2GUyj47aITfB
         E4QwFEBChpTXjAHxV7VI5LG1xdMHUbDT4mW1ZLcLC3Cn7kjsWvxQpJLCLLRPAnF2wr26
         SUhkvMj67R9lIuu8VW5ozteIpvOY3cfVFJSPk=
Subject: Re: MultiFieldQueryParser with default AND and stopfilter
From: Elmer <evanchastelet@gmail.com>
To: java-user@lucene.apache.org
In-Reply-To: <BANLkTi=Q6FH1YwitsQpvQBxZnDn4ZVnddA@mail.gmail.com>
References: <1307523177.3408.14.camel@elmer-P35-DS3P>
	 <BANLkTinx9YFUSkAQXPxWB-VqCLscfGAR+g@mail.gmail.com>
	 <BANLkTikT5E4gLAcPdC+frVDE4qYY5qwQ9w@mail.gmail.com>
	 <1307543749.15928.20.camel@elmer-P35-DS3P>
	 <1307545266.15928.23.camel@elmer-P35-DS3P>
	 <BANLkTi=Q6FH1YwitsQpvQBxZnDn4ZVnddA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Date: Wed, 08 Jun 2011 17:33:20 +0200
Message-ID: <1307547200.15928.31.camel@elmer-P35-DS3P>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

> Using MFQP with AND
> everywhere you'll never get a match if some fields don't contain all
> of the search terms"

I'm sorry to say, but that's not true I guess, look how the query parser
parses the following query:
'information retrieval'
--parsed-to-->
+(title:inform description:inform authors.name:information)
+(title:retriev description:retriev authors.name:retrieval)

in human language: both 'information' and 'retrieval' should appear
somewhere, doesn't matter in which fields.

So if 'information' only appears in the title, and 'retrieval' only in
the description, there is a match (and there is, I just tested it ;))

Br,
Elmer


On Wed, 2011-06-08 at 16:19 +0100, Ian Lea wrote:
> Then surely the stop word issue is a red herring.  Using MFQP with AND
> everywhere you'll never get a match if some fields don't contain all
> of the search terms.
> 
> Even if Erick's exact answer won't apply, I suspect that building up a
> composite boolean query is the way to go.
> 
> 
> --
> Ian.
> 
> On Wed, Jun 8, 2011 at 4:01 PM, Elmer <evanchastelet@gmail.com> wrote:
> > Sorry, I made a mistake here:
> >
> >> Unfortunately, the solution that Erick gave won't do the trick
> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> >> This still won't match documents where both 'the' and 'project' appear
> >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
> >> desc: 'the open source search software from Apache')
> >
> > Correction: this will actually match the example query ('the project'),
> > but this solution won't work if the search query is changed to: 'the
> > search project', since 'search' is not in the title field.
> >
> > Br,
> > Elmer
> >
> >
> > On Wed, 2011-06-08 at 16:35 +0200, Elmer wrote:
> >> Thank you,
> >>
> >> I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;)
> >> And that's where the problem comes in: different fields using different
> >> analyzers (some with, some without a stopfilter). For each term
> >> (tokenized by MFQP itself?), it applies the given analyzer on each
> >> field. If the analyzer returns no token (occurs on 'the' when using the
> >> PerFieldAnalyzerWrapper for the desc field), that field will not be
> >> included in the clause for that term. (see/re-read the example, maybe
> >> it's more clear what I mean now).
> >>
> >> Unfortunately, the solution that Erick gave won't do the trick
> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> >> This still won't match documents where both 'the' and 'project' appear
> >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
> >> desc: 'the open source search software from Apache')
> >>
> >> I hope it's clear what I mean :) Otherwise, let me know!
> >>
> >> BR,
> >> Elmer
> >>
> >>
> >>
> >> On Wed, 2011-06-08 at 14:42 +0100, Ian Lea wrote:
> >> > Except that I think he has loads of other fields and wants to keep it simple.
> >> >
> >> > But how about passing a PerFieldAnalyzerWrapper instance as the
> >> > analyzer to MFQP?  Worth a try.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> >
> >> > On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> >> > > Could you just construct a BooleanQuery with the
> >> > > terms against different fields instead of using MFQP?
> >> > > e.g.
> >> > >
> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
> >> > >
> >> > > etc...? If your QueryParser was created with a
> >> > > PerFieldAnalyzerWrapper I think you might get what you
> >> > > want....
> >> > >
> >> > > Note, bad pseudo code there...
> >> > >
> >> > > Best
> >> > > Erick
> >> > >
> >> > > On Wed, Jun 8, 2011 at 4:52 AM, Elmer <evanchastelet@gmail.com> wrote:
> >> > >> Hi,
> >> > >>
> >> > >> I have a use case in which I use the MultiFieldQueryParser (MFQP) on
> >> > >> some fields that use and some fields that don't use a stopfilter. The
> >> > >> default operator of the MFQP is set to AND.
> >> > >> For example, if the search query is 'the project' (with 'the' included
> >> > >> in the stoplist) and the search fields are:
> >> > >>
> >> > >> title - not using a stopfilter,
> >> > >> desc - using a stopfilter,
> >> > >>
> >> > >> the parsed query becomes:
> >> > >>
> >> > >> '+(title:the) +(title:project desc:project)'.
> >> > >>
> >> > >> So, the problem is that docs that have the term 'the' only appearing in
> >> > >> their desc field are excluded from the results. So every query, with AND
> >> > >> as default operator, that has a stop word in it that only appears in
> >> > >> fields that use a stop filter will have this problem (or similar, if
> >> > >> there is at least one field X not using a stopfilter -> no match if a
> >> > >> stopword from query doesn't appear in field X). Thus, in this example, a
> >> > >> document with title: 'Lucene project' and desc: 'the open source search
> >> > >> software from Apache' will not be matched. In my opinion this is not the
> >> > >> expected behavior. What I'd like to see is that this doc is matched by
> >> > >> the given query. So, for each token in the query, that appears to be a
> >> > >> stopword in a field (i.e. some filter filters the token out), I want it
> >> > >> to be matched instead of not.
> >> > >>
> >> > >> Anyone who knows a way to deal with this? I would prefer to keep using
> >> > >> the MFQP, since I need to support multiple fields, querytime boosting
> >> > >> and lucene syntax. Or is there a disadvantage by doing this?
> >> > >>
> >> > >> Thanks in advance.
> >> > >>
> >> > >> BR,
> >> > >> Elmer van Chastelet
> >> > >>
> >> > >>
> >> > >> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >>
> >> > >>
> >> > >
> >> > > ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >>
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org