Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9B09A200B21 for ; Fri, 10 Jun 2016 10:29:19 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 997C0160A38; Fri, 10 Jun 2016 08:29:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E208E160A04 for ; Fri, 10 Jun 2016 10:29:18 +0200 (CEST) Received: (qmail 85948 invoked by uid 500); 10 Jun 2016 08:29:12 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 85936 invoked by uid 99); 10 Jun 2016 08:29:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jun 2016 08:29:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E5D101A0589 for ; Fri, 10 Jun 2016 08:29:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, KAM_ASCII_DIVIDERS=0.8, KAM_COUK=1.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=messagingengine.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 7lMYV3GqJiFr for ; Fri, 10 Jun 2016 08:29:09 +0000 (UTC) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 4197E5F36F for ; Fri, 10 Jun 2016 08:29:09 +0000 (UTC) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 7719320B42 for ; Fri, 10 Jun 2016 04:29:08 -0400 (EDT) Received: from web4 ([10.202.2.214]) by compute7.internal (MEProxy); Fri, 10 Jun 2016 04:29:08 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=2Xgj5ENFrIG8EvW GcQwuUgpWjrg=; b=mkbHcBmitQnsDsMTXiJyOh8UinijFIJl3jCKkRePX46gdN8 vWHvktERU4yjHSJMAG+ixdCqLDzX4+o79JWhUZW6FASEgZX1vgz78mqt5MyS9KiG 2NLKMlhL3hd+GTOwf9BWEDSDSrTdaIOQk2mo2dVwHGJdA7qnh8ieGTy7fa4g= Received: by mailuser.nyi.internal (Postfix, from userid 99) id 4A2AECC4BC; Fri, 10 Jun 2016 04:29:08 -0400 (EDT) Message-Id: <1465547348.1320676.633590865.30E1F8F2@webmail.messagingengine.com> X-Sasl-Enc: VY/Bk4pslxy/Hz5eI+5UTGBnraV5UewwsvVYa1gXjoWx 1465547348 From: Upayavira To: solr-user@lucene.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-aff28cd1 Subject: Re: Scoring changes between 4.10 and 5.5 Date: Fri, 10 Jun 2016 09:29:08 +0100 In-Reply-To: <766815039.473907.1465519169749.JavaMail.yahoo@mail.yahoo.com> References: <1465508216.485906.633216361.574C43BA@webmail.messagingengine.com> <766815039.473907.1465519169749.JavaMail.yahoo@mail.yahoo.com> archived-at: Fri, 10 Jun 2016 08:29:19 -0000 Tracked it down to this ticket: https://issues.apache.org/jira/browse/LUCENE-6590 which changed the implementation of normalize() in org.apache.lucene.search.similarities.TFIDFSimilarity. I've asked for comment on that ticket. Upayavira On Fri, 10 Jun 2016, at 01:39 AM, Ahmet Arslan wrote: > Hi, > > I wondered the same before and failed to decipher TFIDFSimilarity. > Scoring looks like tf*idf*idf to me. > > I appreciate someone who will shed some light on this. > > Thanks, > Ahmet > > > > On Friday, June 10, 2016 12:37 AM, Upayavira wrote: > I've just done a very simple, single term query against a 4.10 system > and a 5.5 system, each with much the same data. > > The score for the 4.10 system was essentially made up of the field > weight, which is: > score = tf * idf > > Whereas, in the 5.5 system, there is an additional "query weight", which > is idf * query norm. If query norm is 1, then the final score is now: > score = query_weight * field_weight > = ( idf * 1 ) * (tf * idf) > = tf * idf^2 > > Can anyone explain why this new "query weight" element has appeared in > our scores somewhere between 4.10 and 5.5? > > Thanks! > > Upayavira > > 4.10 score ======================================================== > "2937439": { > "match": true, > "value": 5.5993805, > "description": "weight(description:obama in 394012) > [DefaultSimilarity], result of:", > "details": [ > { > "match": true, > "value": 5.5993805, > "description": "fieldWeight in 394012, product of:", > "details": [ > { > "match": true, > "value": 1, > "description": "tf(freq=1.0), with freq of:", > "details": [ > { > "match": true, > "value": 1, > "description": "termFreq=1.0" > } > ] > }, > { > "match": true, > "value": 5.5993805, > "description": "idf(docFreq=56010, maxDocs=5568765)" > }, > { > "match": true, > "value": 1, > "description": "fieldNorm(doc=394012)" > } > ] > } > ] > 5.5 score ======================================================== > "2502281":{ > "match":true, > "value":28.51136, > "description":"weight(description:obama in 43472) [], result > of:", > "details":[{ > "match":true, > "value":28.51136, > "description":"score(doc=43472,freq=1.0), product of:", > "details":[{ > "match":true, > "value":5.339603, > "description":"queryWeight, product of:", > "details":[{ > "match":true, > "value":5.339603, > "description":"idf(docFreq=31905, > maxDocs=2446459)"}, > { > "match":true, > "value":1.0, > "description":"queryNorm"}]}, > { > "match":true, > "value":5.339603, > "description":"fieldWeight in 43472, product of:", > "details":[{ > "match":true, > "value":1.0, > "description":"tf(freq=1.0), with freq of:", > "details":[{ > "match":true, > "value":1.0, > "description":"termFreq=1.0"}]}, > { > "match":true, > "value":5.339603, > "description":"idf(docFreq=31905, > maxDocs=2446459)"}, > { > "match":true, > "value":1.0, > "description":"fieldNorm(doc=43472)"}]}]}]},