Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 48050 invoked from network); 17 Dec 2008 05:04:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Dec 2008 05:04:55 -0000 Received: (qmail 3556 invoked by uid 500); 17 Dec 2008 05:05:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3524 invoked by uid 500); 17 Dec 2008 05:05:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3513 invoked by uid 99); 17 Dec 2008 05:05:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 21:05:00 -0800 X-ASF-Spam-Status: No, hits=2.4 required=10.0 tests=HTML_MESSAGE,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anshumg@gmail.com designates 209.85.200.174 as permitted sender) Received: from [209.85.200.174] (HELO wf-out-1314.google.com) (209.85.200.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Dec 2008 05:04:39 +0000 Received: by wf-out-1314.google.com with SMTP id 28so3229257wfc.20 for ; Tue, 16 Dec 2008 21:04:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=4kCpxU3HvO39u/siFLpqdcdzMyGpN/DH593II2Ai3F8=; b=k8rpDwjWOzFBa24yir4Ds/Mo9aIURML4pdFtL+YpXquH5V1LNAC2tXwoFYWjMkpGZ/ wml09CGXVSnavZLOfk9MHKUrnn+eHyCmQ1CziSJ21p2ymdk8JhMBhn6eSgnPebrrvfMo T8ZGz2EdZjaelIq0VkcHusEGcAHCNrf++phyo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=BJ/krkgk0RhcCvfl5V/Hg3cshh3vRzoA0SXhH6+6Qap/m1O2USp6rYyf4Bw56D/wgo fQ8NqG1SfMpePHbgNpu4H3g8/BVb1MYjloNzy3iYL4fVcVXWs/gzKWUa5gr6EGjnDGSJ Rr9GlUOGTARwa2SN6ndV1kGT7ixoaJ/OxGXUs= Received: by 10.142.230.7 with SMTP id c7mr136059wfh.97.1229490259272; Tue, 16 Dec 2008 21:04:19 -0800 (PST) Received: by 10.143.165.13 with HTTP; Tue, 16 Dec 2008 21:04:19 -0800 (PST) Message-ID: <867513fe0812162104q3fe0d025o32de2c00cc8fb329@mail.gmail.com> Date: Wed, 17 Dec 2008 10:34:19 +0530 From: Anshum To: java-user@lucene.apache.org Subject: Re: IDF scoring issue In-Reply-To: <21046615.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_11999_28774832.1229490259270" References: <21045385.post@talk.nabble.com> <359a92830812161848wf8a5e49va366aed2f50b3c9a@mail.gmail.com> <21046615.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_11999_28774832.1229490259270 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Rajiv, If 'm interpreting your problem correctly, I'd suggest you to try using a phraseQuery with an appropriate slop value. Though again it depends on what is it that you exactly are trying to fetch. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Wed, Dec 17, 2008 at 9:13 AM, Rajiv2 wrote: > > To answer your questions, > 1. there are only two words in the document I'm searching -- city and state > abbrev. lowercased and analyzed by whitespaceanalyzer > 2. the only field and default field is text, so the query becomes text: > fleming text:roofing txt:inc. ...etc. > Using query operator AND instead of OR gives me no results which does not > help. > 3. I've been using explain in Luke and the only difference between "fleming > ga" and "marietta ga" is the idf value is higher for "flemming" ... that's > why "fleming ga" has a higher score. > > Basically i'm just trying to get the "marietta ga" doc to score higher. In > the query text the two words are closer together than "fleming" and "ga". > > rajiv > > > > Erick Erickson wrote: > > > > Note a couple of things: > > > > 1> how a doc scores also takes into account how many other words > > are in the field you're querying on. > > 2> Is "text" your default field? Because what you posted is really > > searching text:fleming :roofing > field>:inc...... > > Not also the implicit OR between each of them. Is this really your > > intent? > > 3> query.explain (as i remember) is your friend to figure out how the > > weights are being calculated. If you haven't got a copy of Luke, I'd > > *strongly* advise getting one and looking at the "explain" tab... > > > > Best > > Erick > > > > On Tue, Dec 16, 2008 at 8:19 PM, Rajiv2 wrote: > > > >> > >> Hello, > >> > >> I'm using the default lucene Queryparser on the search text : fleming > >> roofing inc., marietta ga > >> > >> These items are in my index. > >> > >> doc 1: fleming ga > >> doc 2: marietta ga > >> doc 3: marietta il > >> doc 4: marietta ok > >> doc 5: marietta ok > >> doc 6: fleming pa > >> > >> The first match is always "fleming ga" even though "marietta ga" is > >> closer > >> together in the search text. I'm assuming this is because of the > >> "fleming" > >> has a higher idf than marietta. What should I change in the way i'm > >> querying > >> or indexing to make this happen? > >> > >> Also, I don't want to modify the search text by putting quotes around > >> "marietta ga" which forces the query parser to make a phrase query. > >> > >> thanks, > >> Rajiv > >> -- > >> View this message in context: > >> http://www.nabble.com/IDF-scoring-issue-tp21045385p21045385.html > >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/IDF-scoring-issue-tp21045385p21046615.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_11999_28774832.1229490259270--