Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1646 invoked from network); 14 May 2009 08:33:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 May 2009 08:33:00 -0000 Received: (qmail 97292 invoked by uid 500); 14 May 2009 08:32:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 97253 invoked by uid 500); 14 May 2009 08:32:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 97234 invoked by uid 99); 14 May 2009 08:32:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 May 2009 08:32:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of oren.liat@gmail.com designates 209.85.219.179 as permitted sender) Received: from [209.85.219.179] (HELO mail-ew0-f179.google.com) (209.85.219.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 May 2009 08:32:47 +0000 Received: by ewy27 with SMTP id 27so1535338ewy.5 for ; Thu, 14 May 2009 01:32:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=dXqKni5pgHW6j3vvjMp/8FOyUnNMQEc+yR1EI7t8OCs=; b=OviJGKiMkRRuoXYl8OL3WlMIU23D/cMJ4t1FLFf0/iUwmwXn+a8qqew/bUGKg+mhDr YFPVfWHDpLOOHoRw8HIoXaPP8pKvbPC4/flHQm4v73/3tBWGyimJ6oVqY1Wy0xhy6IE+ O/6XzyPUu6dUeg1zmgQRHgQ2i8MY0qAmLZG2Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kJywzf3Q5qVPuStc0dhkfzfmPtdQmiUaJvfs2TpoVNEwdtKf9AGOVv6ptgsERb1XoF lqto084LYXnhUffFOp2PWs77f8AIeR4TjCYdYeBPqrETsHZEgFOu4vcMI66J7pAI6lg3 UMWY7vMYHgtJZc2N1zdedT+vTS714PjBr5xYI= MIME-Version: 1.0 Received: by 10.210.128.10 with SMTP id a10mr5568138ebd.59.1242289945750; Thu, 14 May 2009 01:32:25 -0700 (PDT) In-Reply-To: <5AE9806B-3D84-4339-AE48-FF4F2F1EAB09@apache.org> References: <297D45E1-56D4-4D19-8274-97B0E7A414E2@apache.org> <53BAD60E-17D2-4B1B-9C03-8A4D4AD51874@apache.org> <5AE9806B-3D84-4339-AE48-FF4F2F1EAB09@apache.org> Date: Thu, 14 May 2009 11:32:25 +0300 Message-ID: Subject: Re: Boosting query - debuging From: liat oren To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0015174789b4ecae8b0469db2a1b X-Virus-Checked: Checked by ClamAV on apache.org --0015174789b4ecae8b0469db2a1b Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit No, As I wrote above For finlin, 6621468 * 6, 5265266 * 12 (I use payload for this) and TTD - 6621468 * 3 (I use payload for this) I search for 6621468 * 3 and it and finlin gets a higher score 2009/5/13 Grant Ingersoll > > On May 13, 2009, at 3:04 AM, liat oren wrote: > > Thanks a lot, Grant. Yes, this is the case, it is longer than TTD. >> Can you also explain me Why in finlin, we have the doc 35433 and in TTD, >> its >> 20? >> Are these the number of dcuments that contain any of the elements exist in >> eaxh word. >> > > My understanding is that 35,433 is the combination of the length of the > document (the one you are "explaining") plus any boosts that you applied and > would also factor in any custom similarity. > > So, how many tokens are in each of those documents? > > > >> So if word TTD contains only 6621468, then 20 is the number of documents >> (words) that contain 6621468? >> I don't think this is the case as I checked and the index doesn;t have >> 35433 >> documents that contain 6621468 or 5265266 >> >> >> 2009/5/11 Grant Ingersoll >> >> >>> On May 10, 2009, at 5:59 AM, liat oren wrote: >>> >>> >>>> The output is the following: >>>> *finlin, score: 19.366615* >>>> 19.366615 = (MATCH) fieldWeight(worlds:6621468^3.0 in 35433), product >>>> of: >>>> 4.2426405 = (MATCH) btq, product of: >>>> 0.70710677 = tf(phraseFreq=0.5) >>>> 6.0 = scorePayload(...) >>>> 7.3036084 = idf(worlds: 6621468=110) >>>> 0.625 = fieldNorm(field=worlds, doc=35433) >>>> >>>> *TTD, score: 15.493294* >>>> 15.493293 = (MATCH) fieldWeight(worlds:6621468^3.0 in 20), product of: >>>> 2.1213202 = (MATCH) btq, product of: >>>> 0.70710677 = tf(phraseFreq=0.5) >>>> 3.0 = scorePayload(...) >>>> 7.3036084 = idf(worlds: 6621468=110) >>>> 1.0 = fieldNorm(field=worlds, doc=20) >>>> >>>> Can anyone explain me the highlighted parts of the score? >>>> I read all the explanations in the api and read a lot of threads about >>>> the >>>> scoring, but didn't really understand these factors. >>>> Why in finlin, we have the doc 35433 and in TTD, its 20? >>>> >>>> >>>> >>> >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html >>> >>> fieldNorm = norm (not sure why the docs aren't consistent) The norm >>> takes >>> into account document length and boosts ( >>> >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html#formula_norm >>> ) >>> >>> The gist of what you are seeing , I believe, is that finlin is a lot >>> longer >>> than TTD. Is that the case? >>> >>> >>> -------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0015174789b4ecae8b0469db2a1b--