Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 85200 invoked from network); 9 Sep 2009 03:45:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Sep 2009 03:45:18 -0000 Received: (qmail 79485 invoked by uid 500); 9 Sep 2009 03:45:16 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79327 invoked by uid 500); 9 Sep 2009 03:45:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79317 invoked by uid 99); 9 Sep 2009 03:45:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Sep 2009 03:45:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates 209.85.219.228 as permitted sender) Received: from [209.85.219.228] (HELO mail-ew0-f228.google.com) (209.85.219.228) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Sep 2009 03:45:05 +0000 Received: by ewy28 with SMTP id 28so612669ewy.28 for ; Tue, 08 Sep 2009 20:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=yFwkDCO1fssBXEfqxOZN6a2lcmVSa8lZApwrW6XBdio=; b=ckOKw7745rF3iATNKXyPBr8jTwtbTYUG5ZgI+5oykUKmQzt3aKV0gFxeyoScVDEM6D TmQuhxfgWGYTzkuNT7bWvumU2lwT7y9zJl/2of9bFhv2pKhzZ7+cvkcEHiJUupvonVD+ zXKrX7bjpgvZMjkD1c4TJXmSyRa710qmvh5ew= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mA7/R1Hws+FBNXOUfYJJ2p0fXjYPfwID/1Peh19g8s3XKdc/WsvHwq2/g0h8UGDHnR 9623Jz2y/hLXMTpXO85VnB8gH1Mpd2X4Vk0T7R+ZHviOAt/WySlKeBw1XpMr0B+WoNUv QgVDiHbej+5caupWJAQFNW0w0ZWg4069RuJdw= MIME-Version: 1.0 Received: by 10.216.28.66 with SMTP id f44mr1866081wea.28.1252467885457; Tue, 08 Sep 2009 20:44:45 -0700 (PDT) In-Reply-To: <4AA6DB1E.2040107@fastmail.fm> References: <4AA6C202.8090406@fastmail.fm> <4AA6DB1E.2040107@fastmail.fm> Date: Wed, 9 Sep 2009 06:44:45 +0300 Message-ID: <786fde50909082044q250e9a6w9a15fc3f5f6641a8@mail.gmail.com> Subject: Re: Is there way to get complete start end matches to be first in the list ? From: Shai Erera To: java-user@lucene.apache.org, paul_t100@fastmail.fm Content-Type: multipart/alternative; boundary=0016e6d6233667c02e04731ce7e3 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d6233667c02e04731ce7e3 Content-Type: text/plain; charset=ISO-8859-1 I can think of a way where you rely solely on scores and therefore there is still chance to get results not ordered the way you want, but you can try it - run the query [foo bar OR "foo bar"^10]. That way, your first result should be scored by [foo], [bar] and ["foo bar"]. Also, the phrase is added a score modifier so that it will score 10 times more than regular matches. You can change it to 2, 5, 100 whichever works for you. There is still some remote chance that the above approach will rank a "bar foo" higher than "foo bar" but it depends on your content. If you don't think this can happen in your content, then I'd try to above query. Shai On Wed, Sep 9, 2009 at 1:30 AM, Paul Taylor wrote: > Michael Barbarelli wrote: > >> >> What I do is run each entry in the hits collection through a home-rolled >> levenstein distance algorithm to obtain a score. Then I sort by score. >> >> On Sep 8, 2009 9:44 PM, "Paul Taylor" >> paul_t100@fastmail.fm>> wrote: >>> >>> Is there way to get complete start end matches to be first in the list >>> >>> We use Lucene to search song albums titles typically one to ten words >>> long. If the user enter something like 'foo bar' everything that contains >>> foo bar is returned with max score , thats fine but it would be better if an >>> exact match is right at the top. Also although an OR Search has been entered >>> would be great if that it ranked matches where both words are together >>> higher than when they are not , but still return results that only match one >>> condirtion. >>> >>> Ideally giving results in this order >>> >>> * Foo Bar (exact match) >>> * The Foo Bar Somethings (substring - exact match) >>> * Bar Foo (all terms match) >>> * Bar Baz and the Foo (substring - all terms match) >>> * Foo (some terms match) >>> * Foo Something (substring - some terms match) >>> >>> >>> Is there something I can do in Lucene, or some way I can modify the query >>> (as entered by the user) to get results better aproaching this >>> >>> >>> Paul >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> java-user-unsubscribe@lucene.apache.org> >>> For additional commands, e-mail: java-user-help@lucene.apache.org>> java-user-help@lucene.apache.org> >>> >>> Thats sounds like the right algorithm but cannot this be done within > Lucene. The trouble is say I get a 1000 hits, I only want the first 10 but > if I openly apply the algorithm to the first ten it might miss out on the > 11th which should really be the 5th, but if have to get all 1000 docs and > apply algorithm its going to be a bit of an overhead. > > Code excerpt might make it clearer: > TopScoreDocCollector collector = TopScoreDocCollector.create(offset + > limit, true); > searcher.search(parser.parse(query), collector); > Results results = new Results(); > TopDocs topDocs = collector.topDocs(); > results.offset = offset; > results.totalHits = topDocs.totalHits; > ScoreDoc docs[] = topDocs.scoreDocs; > float maxScore = topDocs.getMaxScore(); > for (int i = offset; i < docs.length; i++) { > Result result = new Result(); > result.score = docs[i].score / maxScore; > result.doc = new MbDocument(searcher.doc(docs[i].doc)); > results.results.add(result); > } > return results; > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6d6233667c02e04731ce7e3--