Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A42C24055 for ; Wed, 8 Jun 2011 15:19:56 +0000 (UTC) Received: (qmail 48491 invoked by uid 500); 8 Jun 2011 15:19:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 48450 invoked by uid 500); 8 Jun 2011 15:19:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48442 invoked by uid 99); 8 Jun 2011 15:19:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 15:19:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 15:19:46 +0000 Received: by iwr19 with SMTP id 19so748364iwr.35 for ; Wed, 08 Jun 2011 08:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=tV1sjMmU5K+r94QzePrmMIJyvNp0urv3TzLP7Cn0V1o=; b=DjPg/R+nr4AO32py6ZtRRZ0rDktoN+RYQ+QEOl+jQ1EzcUvBS8LIqqznZGPXbeA/C6 SwwcBKhHWBgoSuCiu7XIYs3DRBBgcuyD9c4cl2dH5CIPl32aA2TmsBdtZbRUD2LxD4NX EWLiT6U8RaxPDOtnczVC8yXbC1OCw+LPKgQwA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=GnsWbhJYbI8krkicqoam6JCyQknUkDQsXWVDg/+q9LZjQzRTEDUzKAtnAluG9tdpwP cC4Le3GZgwx6l9ijua/+kHwEJr6MOd86b9morE5FV1zEqkHXMzRiLrbYRYOLdhIiUmDh o8K9xbfdlA4tMlEEOMR0NjLLrN/5H6uMKpX7Y= Received: by 10.231.139.202 with SMTP id f10mr12359375ibu.36.1307546365208; Wed, 08 Jun 2011 08:19:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.199.148 with HTTP; Wed, 8 Jun 2011 08:19:04 -0700 (PDT) In-Reply-To: <1307545266.15928.23.camel@elmer-P35-DS3P> References: <1307523177.3408.14.camel@elmer-P35-DS3P> <1307543749.15928.20.camel@elmer-P35-DS3P> <1307545266.15928.23.camel@elmer-P35-DS3P> From: Ian Lea Date: Wed, 8 Jun 2011 16:19:04 +0100 Message-ID: Subject: Re: MultiFieldQueryParser with default AND and stopfilter To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Then surely the stop word issue is a red herring. Using MFQP with AND everywhere you'll never get a match if some fields don't contain all of the search terms. Even if Erick's exact answer won't apply, I suspect that building up a composite boolean query is the way to go. -- Ian. On Wed, Jun 8, 2011 at 4:01 PM, Elmer wrote: > Sorry, I made a mistake here: > >> Unfortunately, the solution that Erick gave won't do the trick >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> This still won't match documents where both 'the' and 'project' appear >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and >> desc: 'the open source search software from Apache') > > Correction: this will actually match the example query ('the project'), > but this solution won't work if the search query is changed to: 'the > search project', since 'search' is not in the title field. > > Br, > Elmer > > > On Wed, 2011-06-08 at 16:35 +0200, Elmer wrote: >> Thank you, >> >> I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;) >> And that's where the problem comes in: different fields using different >> analyzers (some with, some without a stopfilter). For each term >> (tokenized by MFQP itself?), it applies the given analyzer on each >> field. If the analyzer returns no token (occurs on 'the' when using the >> PerFieldAnalyzerWrapper for the desc field), that field will not be >> included in the clause for that term. (see/re-read the example, maybe >> it's more clear what I mean now). >> >> Unfortunately, the solution that Erick gave won't do the trick >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> This still won't match documents where both 'the' and 'project' appear >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and >> desc: 'the open source search software from Apache') >> >> I hope it's clear what I mean :) Otherwise, let me know! >> >> BR, >> Elmer >> >> >> >> On Wed, 2011-06-08 at 14:42 +0100, Ian Lea wrote: >> > Except that I think he has loads of other fields and wants to keep it = simple. >> > >> > But how about passing a PerFieldAnalyzerWrapper instance as the >> > analyzer to MFQP? =A0Worth a try. >> > >> > >> > -- >> > Ian. >> > >> > >> > On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson wrote: >> > > Could you just construct a BooleanQuery with the >> > > terms against different fields instead of using MFQP? >> > > e.g. >> > > >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> > > >> > > etc...? If your QueryParser was created with a >> > > PerFieldAnalyzerWrapper I think you might get what you >> > > want.... >> > > >> > > Note, bad pseudo code there... >> > > >> > > Best >> > > Erick >> > > >> > > On Wed, Jun 8, 2011 at 4:52 AM, Elmer wrot= e: >> > >> Hi, >> > >> >> > >> I have a use case in which I use the MultiFieldQueryParser (MFQP) o= n >> > >> some fields that use and some fields that don't use a stopfilter. T= he >> > >> default operator of the MFQP is set to AND. >> > >> For example, if the search query is 'the project' (with 'the' inclu= ded >> > >> in the stoplist) and the search fields are: >> > >> >> > >> title - not using a stopfilter, >> > >> desc - using a stopfilter, >> > >> >> > >> the parsed query becomes: >> > >> >> > >> '+(title:the) +(title:project desc:project)'. >> > >> >> > >> So, the problem is that docs that have the term 'the' only appearin= g in >> > >> their desc field are excluded from the results. So every query, wit= h AND >> > >> as default operator, that has a stop word in it that only appears i= n >> > >> fields that use a stop filter will have this problem (or similar, i= f >> > >> there is at least one field X not using a stopfilter -> no match if= a >> > >> stopword from query doesn't appear in field X). Thus, in this examp= le, a >> > >> document with title: 'Lucene project' and desc: 'the open source se= arch >> > >> software from Apache' will not be matched. In my opinion this is no= t the >> > >> expected behavior. What I'd like to see is that this doc is matched= by >> > >> the given query. So, for each token in the query, that appears to b= e a >> > >> stopword in a field (i.e. some filter filters the token out), I wan= t it >> > >> to be matched instead of not. >> > >> >> > >> Anyone who knows a way to deal with this? I would prefer to keep us= ing >> > >> the MFQP, since I need to support multiple fields, querytime boosti= ng >> > >> and lucene syntax. Or is there a disadvantage by doing this? >> > >> >> > >> Thanks in advance. >> > >> >> > >> BR, >> > >> Elmer van Chastelet >> > >> >> > >> >> > >> -------------------------------------------------------------------= -- >> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> >> > >> >> > > >> > > --------------------------------------------------------------------= - >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > > For additional commands, e-mail: java-user-help@lucene.apache.org >> > > >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org