From java-user-return-48093-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Dec 13 19:36:36 2010 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98583 invoked from network); 13 Dec 2010 19:36:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Dec 2010 19:36:35 -0000 Received: (qmail 93673 invoked by uid 500); 13 Dec 2010 19:36:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93632 invoked by uid 500); 13 Dec 2010 19:36:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93624 invoked by uid 99); 13 Dec 2010 19:36:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Dec 2010 19:36:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.214.51 as permitted sender) Received: from [209.85.214.51] (HELO mail-bw0-f51.google.com) (209.85.214.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Dec 2010 19:36:27 +0000 Received: by bwz8 with SMTP id 8so7489898bwz.24 for ; Mon, 13 Dec 2010 11:36:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=yowW77Iagry2+xlDZwD7PEWxk5g/TE/153iRegeioLo=; b=A4lTQ+z9fnaCY7L7KirmeR7ize4SYZVNb63lsfH5dPeQS9Q142Kh0rFgTIVAtMYY4I Pz5piItJP4vx+XCXdxo6D0RZqTO1CvotWGMuLImetEMvJvbAU37hcPyiMX7lp5EeqC3z ZLiqRyB8yEzANG1jkgmUUOMv1s1F984X1jwnw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Jt+wgyLyTso8KjFOvBbu0jqOhDoBpCZKZlUdPdma3ocGFi5bxRNHdkiX8vOUYCmzEy m2DRgoudQjoqb8u+grQxWEgsm9zsoz8Ir90GPS3ABCEUKxe5uRinDK8mTLv7U/7MOZZH hqRyVd8hwA8Pa67mxAws3jFw9KFW7u/4UrpaU= Received: by 10.204.47.198 with SMTP id o6mr3171817bkf.147.1292268965860; Mon, 13 Dec 2010 11:36:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.75.146 with HTTP; Mon, 13 Dec 2010 11:35:45 -0800 (PST) In-Reply-To: References: From: Robert Muir Date: Mon, 13 Dec 2010 14:35:45 -0500 Message-ID: Subject: Re: The logic of QueryParser To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt wrote: > I just encountered an unexpected behavior in query parser. =C2=A0So, if y= ou pass > in a query that is multiple terms, like "cat hat", the query that is > returned uses an or between the two term searches, instead of an and. =C2= =A0That > is, the query will return all documents with the given field containing > either "cat" or "hat". =C2=A0Now, I know about phrase queries, using "\"c= at > hat\"", and I know about +, "+cat +hat". =C2=A0So there are ways to work = around > the problem- the behavior was just unintuitive for me and several others.= =C2=A0I > was just wondering what the logic was for defaulting to or instead of and= . > > I have googled the mailing list archives and didn't find anything. =C2=A0= But if > this has been discussed to death, please just point me to the threads in = the > archive. rather than stirring up some old flame war. =C2=A0Or just tell m= e what > to google for (the terms I've tried haven't yielded anything useful). > Thanks. > Well its not quite a pure OR query, since it also incorporates Similarity.coord() which boosts documents that contain more of the query terms. But to understand the default, imagine a more natural query of "where is the cat in the hat". The default OR query will still give good results, including boosting documents that contain both 'cat' and 'hat', but with AND you would get nothing if all of those low-value terms for some reason were not in that document. However, if your queries are more restricted, maybe you want to either: 1) adjust Similarity.coord() to make this boost better for your app (for example, maybe only give a boost if overlap =3D=3D maxOverlap, and maybe play with the amount of boost) or 2) set your queryParser's default operator to AND with the .setDefaultOperator() method..., but realize this could exclude very relevant results that happen to be missing some useless keywords. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org