Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 45881 invoked from network); 4 Sep 2008 23:22:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2008 23:22:10 -0000 Received: (qmail 10912 invoked by uid 500); 4 Sep 2008 23:22:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10878 invoked by uid 500); 4 Sep 2008 23:22:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10867 invoked by uid 99); 4 Sep 2008 23:22:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 16:22:00 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [194.109.24.31] (HELO smtp-vbr11.xs4all.nl) (194.109.24.31) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 23:21:00 +0000 Received: from k8u.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr11.xs4all.nl (8.13.8/8.13.8) with ESMTP id m84NLTjq037499 for ; Fri, 5 Sep 2008 01:21:30 +0200 (CEST) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-user@lucene.apache.org Subject: Re: PhraseQuery issues - differences with SpanNearQuery Date: Fri, 5 Sep 2008 01:21:35 +0200 User-Agent: KMail/1.9.9 References: <74DFE4F68173F94787693F06E792A8F377261F@exch1.ad.getmedium.com> <48C02B51.1080908@gmail.com> In-Reply-To: <48C02B51.1080908@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809050121.35458.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked by ClamAV on apache.org Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: > Sounds like its more in line with what you are looking for. If I > remember correctly, the phrase query factors in the edit distance in > scoring, but the NearSpanQuery will just use the combined idf for > each of the terms in it, so distance shouldnt matter with spans (I'm > sure Paul will correct me if I am wrong). SpanScorer will use the similarity slop factor for each matching span size to adjust the effective frequency. The span size is the difference in position between the first and last matching term, and idf is not used for scoring Spans. The reason why idf is not used could be that there is no basic score value associated with inner spans; only top level spans are scored by SpanScorer. For more details, please consult the SpanScorer code. Regards, Paul Elschot > > - Mark > > Yannis Pavlidis wrote: > > Hi, > > > > I am having an issue when using the PhraseQuery which is best > > illustrated with this example: > > > > I have created 2 documents to emulate URLs. One with a URL of: > > "http://www.airballoon.com" and title "air balloon" and the second > > one with URL "http://www.balloonair.com" and title: "balloon air". > > > > Test1 (PhraseQuery) > > ====== > > Now when I use the phrase query with - title: "air balloon" ~2 > > I get back: > > > > url: "http://www.airballoon.com" - score: 1.0 > > url: "http://www.balloonair.com" - score: 0.57 > > > > Test2 (PhraseQuery) > > ====== > > Now when I use the phrase query with - title: "balloon air" ~2 > > I get back: > > url: "http://www.balloonair.com" - score: 1.0 > > url: "http://www.airballoon.com" - score: 0.57 > > > > Test3 (PhraseQuery) > > ====== > > Now when I use the phrase query with - title: "air balloon" ~2 > > title: "balloon air" ~2 I get back: > > url: "http://www.airballoon.com" - score: 1.0 > > url: "http://www.balloonair.com" - score: 1.0 > > > > Test4 (SpanNearQuery) > > ======= > > spanNear([title:air, title:balloon], 2, false) > > I get back: > > url: "http://www.airballoon.com" - score: 1.0 > > url: "http://www.balloonair.com" - score: 1.0 > > > > I would have expected that Test1, Test2 would actually return both > > URLs with score of 1.0 since I am setting the slop to 2. It seems > > though that lucene really favors and absolute exact match. > > > > Is it safe to assume that for what I am looking for (basically > > score the docs the same regardless on when someone is searching for > > "air balloon" or "balloon air") it would be better to use the > > SpanNearQuery rather than the PhraseQuery? > > > > Any input would be appreciated. > > > > Thanks in advance, > > > > Yannis. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org