Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 48028 invoked from network); 5 Oct 2008 18:09:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Oct 2008 18:09:25 -0000 Received: (qmail 7004 invoked by uid 500); 5 Oct 2008 18:09:22 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 6470 invoked by uid 500); 5 Oct 2008 18:09:21 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 6459 invoked by uid 99); 5 Oct 2008 18:09:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 11:09:21 -0700 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_SECURITYSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yseeley@gmail.com designates 209.85.198.232 as permitted sender) Received: from [209.85.198.232] (HELO rv-out-0506.google.com) (209.85.198.232) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 18:08:20 +0000 Received: by rv-out-0506.google.com with SMTP id f6so2336160rvb.5 for ; Sun, 05 Oct 2008 11:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=59Wb80LUGyJL6HdDBWYA53RnASjPifQF5CuKJf4j8hA=; b=fWjbKyDjq7HI49OMnudSscFadEup4ZOQiSAXmaFP2VYwkfPExFo9645mIVQNNab8Tl 0EUCnDkmQ5s4rmd+k3FEnE/yddsqvvdjJGpC3ZcCz7V6qcB0gN5OBURg3xxkSzqqvASD D2kuG2qG17pIxuuI1i8PoF2udBFJcmEO6iTks= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=kk8Xg+XSHTRmW90P6AaWRnRkILUtHiWUIFoaI4Yj4enoCqUZs5GJUx1JQgCrUO5oRU /h6UdeZQBnvn6XkjlEu96e4hNBP19l5PV/tEGAxUHXPlkibzBJ2TgfrIC5g8Fh+Sr9oZ +pWJBaltQAB/ACETsbw5HWiST1V+5gM+4ex/E= Received: by 10.141.49.18 with SMTP id b18mr2263536rvk.92.1223230136207; Sun, 05 Oct 2008 11:08:56 -0700 (PDT) Received: by 10.141.212.15 with HTTP; Sun, 5 Oct 2008 11:08:56 -0700 (PDT) Message-ID: Date: Sun, 5 Oct 2008 14:08:56 -0400 From: "Yonik Seeley" Sender: yseeley@gmail.com To: solr-user@lucene.apache.org Subject: Re: dismax and long phrases In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: X-Google-Sender-Auth: fff442b6686b4ce8 X-Virus-Checked: Checked by ClamAV on apache.org Hmmm, tricky. I think you've uncovered an algorithmic flaw in DisMax. Consider 2 fields, f1, f2 and 2 terms foo and bar. For illustration purposes, here is a query that's structurally equivalent (assuming mm=100% of terms must match): +(f1:foo OR f2:foo) +(f1:bar OR f2:bar) OK, so it says that "foo" must appear in either field and "bar" must appear in either field. So far so good. Now consider what happens if "bar" is a stopword for f1... the query becomes +(f1:foo OR f2:foo) +(f2:bar) Oops, now this query is saying that bar *must* appear in f2... it's more restrictive than the first query. It appears that dismax is a bit broken when some of the fields have stopwords and some don't. Offhand, I don't see an easy fix for this problem. -Yonik On Fri, Oct 3, 2008 at 5:44 PM, Jon Drukman wrote: > i have a document with the following field > > Saying goodbye to Norman > > if i search for "saying goodbye to norman" with the standard query, it works > fine. if i specify dismax, however, it does not match. here's the output > of debugQuery, which I don't understand at all: > > saying goodbye to norman > saying goodbye to norman > +((DisjunctionMaxQuery((user_name:saying^0.4 | > description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | > location:saying^0.6 | name:say^1.5)~0.01) > DisjunctionMaxQuery((user_name:goodbye^0.4 | description:goodby | > tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 | > location:goodbye^0.6 | name:goodby^1.5)~0.01) > DisjunctionMaxQuery((user_name:to^0.4 | location:to^0.6)~0.01) > DisjunctionMaxQuery((user_name:norman^0.4 | description:norman | > tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 | > location:norman^0.6 | name:norman^1.5)~0.01))~4) > DisjunctionMaxQuery((description:"say goodby norman"~100 | group_name:"say > goodby norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01) > +(((user_name:saying^0.4 | description:say > | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | location:saying^0.6 | > name:say^1.5)~0.01 (user_name:goodbye^0.4 | description:goodby | > tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 | > location:goodbye^0.6 | name:goodby^1.5)~0.01 (user_name:to^0.4 | > location:to^0.6)~0.01 (user_name:norman^0.4 | description:norman | > tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 | > location:norman^0.6 | name:norman^1.5)~0.01)~4) (description:"say goodby > norman"~100 | group_name:"say goodby norman"~100^1.5 | name:"say goodby > norman"~100^1.5)~0.01 > > > > it works fine if I search for "say goodbye" or "saying goodbye" or "saying > goodbye norman". how can i get it to do exact matches (which should score > very high)? > > > -jsd- > >