Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3A37426A for ; Wed, 8 Jun 2011 15:40:26 +0000 (UTC) Received: (qmail 10319 invoked by uid 500); 8 Jun 2011 15:40:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10272 invoked by uid 500); 8 Jun 2011 15:40:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10264 invoked by uid 99); 8 Jun 2011 15:40:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 15:40:24 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 15:40:17 +0000 Received: by iym1 with SMTP id 1so775144iym.35 for ; Wed, 08 Jun 2011 08:39:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=M2VTw76HSHjnv5GqPWeqghohVWddc8Hh2Aa2/D8fHlc=; b=Cimluhg2TQDNQNZHzcR10ya/UA/soutoi8jBYhT4mvNItF+UdJzKooj3VvwKShC9CO fukvIK67Jt4BH2IayAu+dVNrqesItF+bfCWWQU3kkPfBGSZFdNsS4to/IxGbEE4zCt7F 7lXw0PwDfsRoiRoneIP2hdgTAfw2IGk87N54c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Dm9EdkAk+6tt7D/vFCDcScxeeelGUe1YT7FDZDsagkekigqPjJrdShQxOpLliE/K0a 8h/V6IzdDbUWP/aCVQ7u6Ht+Gfl44vosNJvuPEcq13p2w5txRbyp87LHZQHROBbLp41q PfeTkHe8nckcP47qvBzuBP4aP75iYJeC5mNWo= Received: by 10.231.43.138 with SMTP id w10mr106606ibe.11.1307547597100; Wed, 08 Jun 2011 08:39:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.199.148 with HTTP; Wed, 8 Jun 2011 08:39:37 -0700 (PDT) In-Reply-To: <1307547200.15928.31.camel@elmer-P35-DS3P> References: <1307523177.3408.14.camel@elmer-P35-DS3P> <1307543749.15928.20.camel@elmer-P35-DS3P> <1307545266.15928.23.camel@elmer-P35-DS3P> <1307547200.15928.31.camel@elmer-P35-DS3P> From: Ian Lea Date: Wed, 8 Jun 2011 16:39:37 +0100 Message-ID: Subject: Re: MultiFieldQueryParser with default AND and stopfilter To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm sure you are right and I'm wrong - sorry for the waste of space. However I still think you should build it all up in code. -- Ian. On Wed, Jun 8, 2011 at 4:33 PM, Elmer wrote: >> Using MFQP with AND >> everywhere you'll never get a match if some fields don't contain all >> of the search terms" > > I'm sorry to say, but that's not true I guess, look how the query parser > parses the following query: > 'information retrieval' > --parsed-to--> > +(title:inform description:inform authors.name:information) > +(title:retriev description:retriev authors.name:retrieval) > > in human language: both 'information' and 'retrieval' should appear > somewhere, doesn't matter in which fields. > > So if 'information' only appears in the title, and 'retrieval' only in > the description, there is a match (and there is, I just tested it ;)) > > Br, > Elmer > > > On Wed, 2011-06-08 at 16:19 +0100, Ian Lea wrote: >> Then surely the stop word issue is a red herring. =A0Using MFQP with AND >> everywhere you'll never get a match if some fields don't contain all >> of the search terms. >> >> Even if Erick's exact answer won't apply, I suspect that building up a >> composite boolean query is the way to go. >> >> >> -- >> Ian. >> >> On Wed, Jun 8, 2011 at 4:01 PM, Elmer wrote: >> > Sorry, I made a mistake here: >> > >> >> Unfortunately, the solution that Erick gave won't do the trick >> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> >> This still won't match documents where both 'the' and 'project' appea= r >> >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and >> >> desc: 'the open source search software from Apache') >> > >> > Correction: this will actually match the example query ('the project')= , >> > but this solution won't work if the search query is changed to: 'the >> > search project', since 'search' is not in the title field. >> > >> > Br, >> > Elmer >> > >> > >> > On Wed, 2011-06-08 at 16:35 +0200, Elmer wrote: >> >> Thank you, >> >> >> >> I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;) >> >> And that's where the problem comes in: different fields using differe= nt >> >> analyzers (some with, some without a stopfilter). For each term >> >> (tokenized by MFQP itself?), it applies the given analyzer on each >> >> field. If the analyzer returns no token (occurs on 'the' when using t= he >> >> PerFieldAnalyzerWrapper for the desc field), that field will not be >> >> included in the clause for that term. (see/re-read the example, maybe >> >> it's more clear what I mean now). >> >> >> >> Unfortunately, the solution that Erick gave won't do the trick >> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> >> This still won't match documents where both 'the' and 'project' appea= r >> >> in DIFFERENT fields (i.e. a document with title: 'Lucene project' and >> >> desc: 'the open source search software from Apache') >> >> >> >> I hope it's clear what I mean :) Otherwise, let me know! >> >> >> >> BR, >> >> Elmer >> >> >> >> >> >> >> >> On Wed, 2011-06-08 at 14:42 +0100, Ian Lea wrote: >> >> > Except that I think he has loads of other fields and wants to keep = it simple. >> >> > >> >> > But how about passing a PerFieldAnalyzerWrapper instance as the >> >> > analyzer to MFQP? =A0Worth a try. >> >> > >> >> > >> >> > -- >> >> > Ian. >> >> > >> >> > >> >> > On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson wrote: >> >> > > Could you just construct a BooleanQuery with the >> >> > > terms against different fields instead of using MFQP? >> >> > > e.g. >> >> > > >> >> > > bq.add(qp.parse("title:(the AND project)", SHOULD)) >> >> > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) >> >> > > >> >> > > etc...? If your QueryParser was created with a >> >> > > PerFieldAnalyzerWrapper I think you might get what you >> >> > > want.... >> >> > > >> >> > > Note, bad pseudo code there... >> >> > > >> >> > > Best >> >> > > Erick >> >> > > >> >> > > On Wed, Jun 8, 2011 at 4:52 AM, Elmer w= rote: >> >> > >> Hi, >> >> > >> >> >> > >> I have a use case in which I use the MultiFieldQueryParser (MFQP= ) on >> >> > >> some fields that use and some fields that don't use a stopfilter= . The >> >> > >> default operator of the MFQP is set to AND. >> >> > >> For example, if the search query is 'the project' (with 'the' in= cluded >> >> > >> in the stoplist) and the search fields are: >> >> > >> >> >> > >> title - not using a stopfilter, >> >> > >> desc - using a stopfilter, >> >> > >> >> >> > >> the parsed query becomes: >> >> > >> >> >> > >> '+(title:the) +(title:project desc:project)'. >> >> > >> >> >> > >> So, the problem is that docs that have the term 'the' only appea= ring in >> >> > >> their desc field are excluded from the results. So every query, = with AND >> >> > >> as default operator, that has a stop word in it that only appear= s in >> >> > >> fields that use a stop filter will have this problem (or similar= , if >> >> > >> there is at least one field X not using a stopfilter -> no match= if a >> >> > >> stopword from query doesn't appear in field X). Thus, in this ex= ample, a >> >> > >> document with title: 'Lucene project' and desc: 'the open source= search >> >> > >> software from Apache' will not be matched. In my opinion this is= not the >> >> > >> expected behavior. What I'd like to see is that this doc is matc= hed by >> >> > >> the given query. So, for each token in the query, that appears t= o be a >> >> > >> stopword in a field (i.e. some filter filters the token out), I = want it >> >> > >> to be matched instead of not. >> >> > >> >> >> > >> Anyone who knows a way to deal with this? I would prefer to keep= using >> >> > >> the MFQP, since I need to support multiple fields, querytime boo= sting >> >> > >> and lucene syntax. Or is there a disadvantage by doing this? >> >> > >> >> >> > >> Thanks in advance. >> >> > >> >> >> > >> BR, >> >> > >> Elmer van Chastelet >> >> > >> >> >> > >> >> >> > >> ----------------------------------------------------------------= ----- >> >> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> > >> For additional commands, e-mail: java-user-help@lucene.apache.or= g >> >> > >> >> >> > >> >> >> > > >> >> > > -----------------------------------------------------------------= ---- >> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > >> >> > > >> >> > >> >> > -------------------------------------------------------------------= -- >> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > >> >> >> >> >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org