Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A630D10086 for ; Wed, 12 Jun 2013 13:03:05 +0000 (UTC) Received: (qmail 24971 invoked by uid 500); 12 Jun 2013 13:03:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24536 invoked by uid 500); 12 Jun 2013 13:02:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 20179 invoked by uid 99); 12 Jun 2013 13:00:44 -0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oliver.xuyong@gmail.com designates 209.85.192.171 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version:content-type:x-mailer :thread-index:x-mimeole; bh=1xi2C23N3gf/3pWLkInoCmPT4XDnNoCHTNfxBj5/bbQ=; b=PjTJ8EkZXVaN7KmUdppgoPQVapm+4cMH9CYYKymam8TL5ONymqtUn4PDuozYelNs+7 lQDAGYQgFfr9fiP2YADzHfICI8OPwIg2uf85gWhydJwQkdmXS/j+VKAoInQNRc9w0bOb Ez5zvvloBxz0T5tRdpK4PVdv4H5filOr0YzNCWSeT22eEyNF8pEDhWCPN8QnufEckbK3 CzAvSuXvdur73ZsnPIlFlHRkT/H7F6X4/tUpzk3hfx+teINObbqLmBs6Y4D42gwvXwRs hsQHsc7ItYbH+qCgImRYW8MMgU54YUeMPZ7yUAa8qT2b/egQjfHYXvG9Ci5S3LUR4X45 sLLw== X-Received: by 10.66.146.65 with SMTP id ta1mr23846407pab.7.1371042016734; Wed, 12 Jun 2013 06:00:16 -0700 (PDT) From: "Oliver Xu" To: Subject: A Problem in Customizing DefaultSimilarity Date: Wed, 12 Jun 2013 21:00:01 +0800 Message-ID: <528E5855D3AD41AB875AA5055E1F44B3@lenovoTHINK> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0009_01CE67AF.D3DC7020" X-Mailer: Microsoft Office Outlook 11 Thread-Index: Ac5nbL+NsMeNC/MFSeeccfXUnuiJyw== X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7600.16807 X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_0009_01CE67AF.D3DC7020 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Dear, =20 I built my own scoring class by extending the DefaultSimilarity. Three = major methods from DefaultSimilarity were overrided, including: 1. public float lengthNorm(FieldInvertState state) 2. public float tf(float freq) 3.public float idf(long docFreq, long numDocs) =20 However, with embedded printing sentences (they are used to indicate = which and when a method is called by printing messages to console), I found = only tf() and idf() were called during a search process. The method = lengthNorm(), which is really what I wanted to work on, was never called. =20 I rolled back to Lucene350 and checked again. The DefaultSimilarity = under Lucene 350 uses a computeNorm() method instead of lengthNorm(). And = again, the overrided computeNorm() is never called either. =20 I used explanation() to check the components of each score for a = document. Besides the idf and tf scores, I did find a fieldNorm score, which has something to do with the document length. =20 My questions are: 1. Why are the overrided lengthNorm() (under Lucene410) or computeNorm() (under Lucene350) methods not called during a searching process? 2. How and where is fieldNorm calculated? =20 Thank you very much! =20 Oliver =20 Oliver Xu(=D0=EC=D3=C0) Aigine InfoTech Co.=A3=A8=D3=EF=C7=E6=BF=C6=BC=BC=A3=A9 W: www.aigine.com T: +86-189189 02886 E: oliver.xu@aigine.com MSN: oliver_xuyong@msn.com Weibo: =D3=EF=C7=E6-=BC=AF=CC=E5=D6=C7=BB=DB=B1=E0=B3=CC =20 ------=_NextPart_000_0009_01CE67AF.D3DC7020--