Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates
 209.85.214.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=GnsWbhJYbI8krkicqoam6JCyQknUkDQsXWVDg/+q9LZjQzRTEDUzKAtnAluG9tdpwP
         cC4Le3GZgwx6l9ijua/+kHwEJr6MOd86b9morE5FV1zEqkHXMzRiLrbYRYOLdhIiUmDh
         o8K9xbfdlA4tMlEEOMR0NjLLrN/5H6uMKpX7Y=
MIME-Version: 1.0
In-Reply-To: <1307545266.15928.23.camel@elmer-P35-DS3P>
References: <1307523177.3408.14.camel@elmer-P35-DS3P>
 <BANLkTinx9YFUSkAQXPxWB-VqCLscfGAR+g@mail.gmail.com>
 <BANLkTikT5E4gLAcPdC+frVDE4qYY5qwQ9w@mail.gmail.com>
 <1307543749.15928.20.camel@elmer-P35-DS3P>
 <1307545266.15928.23.camel@elmer-P35-DS3P>
From: Ian Lea <ian.lea@gmail.com>
Date: Wed, 8 Jun 2011 16:19:04 +0100
Message-ID: <BANLkTi=Q6FH1YwitsQpvQBxZnDn4ZVnddA@mail.gmail.com>
Subject: Re: MultiFieldQueryParser with default AND and stopfilter
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Then surely the stop word issue is a red herring.  Using MFQP with AND
everywhere you'll never get a match if some fields don't contain all
of the search terms.

Even if Erick's exact answer won't apply, I suspect that building up a
composite boolean query is the way to go.


--
Ian.

On Wed, Jun 8, 2011 at 4:01 PM, Elmer <evanchastelet@gmail.com> wrote:
> Sorry, I made a mistake here:
>
>> Unfortunately, the solution that Erick gave won't do the trick
>> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
>> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
>> This still won't match documents where both 'the' and 'project' appear
>> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
>> desc: 'the open source search software from Apache')
>
> Correction: this will actually match the example query ('the project'),
> but this solution won't work if the search query is changed to: 'the
> search project', since 'search' is not in the title field.
>
> Br,
> Elmer
>
>
> On Wed, 2011-06-08 at 16:35 +0200, Elmer wrote:
>> Thank you,
>>
>> I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;)
>> And that's where the problem comes in: different fields using different
>> analyzers (some with, some without a stopfilter). For each term
>> (tokenized by MFQP itself?), it applies the given analyzer on each
>> field. If the analyzer returns no token (occurs on 'the' when using the
>> PerFieldAnalyzerWrapper for the desc field), that field will not be
>> included in the clause for that term. (see/re-read the example, maybe
>> it's more clear what I mean now).
>>
>> Unfortunately, the solution that Erick gave won't do the trick
>> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
>> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
>> This still won't match documents where both 'the' and 'project' appear
>> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and
>> desc: 'the open source search software from Apache')
>>
>> I hope it's clear what I mean :) Otherwise, let me know!
>>
>> BR,
>> Elmer
>>
>>
>>
>> On Wed, 2011-06-08 at 14:42 +0100, Ian Lea wrote:
>> > Except that I think he has loads of other fields and wants to keep it =
simple.
>> >
>> > But how about passing a PerFieldAnalyzerWrapper instance as the
>> > analyzer to MFQP? =A0Worth a try.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson <erickerickson@gmail.co=
m> wrote:
>> > > Could you just construct a BooleanQuery with the
>> > > terms against different fields instead of using MFQP?
>> > > e.g.
>> > >
>> > > bq.add(qp.parse("title:(the AND project)", SHOULD))
>> > > bq.add(qp.parse("desc:(the AND project)", SHOULD))
>> > >
>> > > etc...? If your QueryParser was created with a
>> > > PerFieldAnalyzerWrapper I think you might get what you
>> > > want....
>> > >
>> > > Note, bad pseudo code there...
>> > >
>> > > Best
>> > > Erick
>> > >
>> > > On Wed, Jun 8, 2011 at 4:52 AM, Elmer <evanchastelet@gmail.com> wrot=
e:
>> > >> Hi,
>> > >>
>> > >> I have a use case in which I use the MultiFieldQueryParser (MFQP) o=
n
>> > >> some fields that use and some fields that don't use a stopfilter. T=
he
>> > >> default operator of the MFQP is set to AND.
>> > >> For example, if the search query is 'the project' (with 'the' inclu=
ded
>> > >> in the stoplist) and the search fields are:
>> > >>
>> > >> title - not using a stopfilter,
>> > >> desc - using a stopfilter,
>> > >>
>> > >> the parsed query becomes:
>> > >>
>> > >> '+(title:the) +(title:project desc:project)'.
>> > >>
>> > >> So, the problem is that docs that have the term 'the' only appearin=
g in
>> > >> their desc field are excluded from the results. So every query, wit=
h AND
>> > >> as default operator, that has a stop word in it that only appears i=
n
>> > >> fields that use a stop filter will have this problem (or similar, i=
f
>> > >> there is at least one field X not using a stopfilter -> no match if=
 a
>> > >> stopword from query doesn't appear in field X). Thus, in this examp=
le, a
>> > >> document with title: 'Lucene project' and desc: 'the open source se=
arch
>> > >> software from Apache' will not be matched. In my opinion this is no=
t the
>> > >> expected behavior. What I'd like to see is that this doc is matched=
 by
>> > >> the given query. So, for each token in the query, that appears to b=
e a
>> > >> stopword in a field (i.e. some filter filters the token out), I wan=
t it
>> > >> to be matched instead of not.
>> > >>
>> > >> Anyone who knows a way to deal with this? I would prefer to keep us=
ing
>> > >> the MFQP, since I need to support multiple fields, querytime boosti=
ng
>> > >> and lucene syntax. Or is there a disadvantage by doing this?
>> > >>
>> > >> Thanks in advance.
>> > >>
>> > >> BR,
>> > >> Elmer van Chastelet
>> > >>
>> > >>
>> > >> -------------------------------------------------------------------=
--
>> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>
>> > >>
>> > >
>> > > --------------------------------------------------------------------=
-
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org