Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 877AB62D7 for ; Mon, 13 Jun 2011 13:46:21 +0000 (UTC) Received: (qmail 77109 invoked by uid 500); 13 Jun 2011 13:46:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77046 invoked by uid 500); 13 Jun 2011 13:46:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77038 invoked by uid 99); 13 Jun 2011 13:46:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 13:46:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 13:46:11 +0000 Received: by iwr19 with SMTP id 19so5992551iwr.35 for ; Mon, 13 Jun 2011 06:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=2s//v4hrrJVDUFnQQ7ltpN+3jYBJQaICIpnSV51A6/Y=; b=LGLDXGOD+Yykx+D5zeVrfR/mLYdCCgFEL6Y1QWJTtrEeqjDeC+BVh54RqhSCJNkfsO UBwRB6d+gVPLoHiVFRo0fYtUMxE+cR3O04sUFWebwmvDHWyTCcI8pu7DIqdi1pAMWNm7 6wN+DQt0krvnDPFMa8lBf1n42k3yXB7+fVL9A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=e0/kk9ReNUvfJWeGnAxQ+SNTXA9WRk2Ao+g4AoaXsLtSRhiPt0iE4Qgk5gSdr3vQ29 H8+YWhNdHZctogIVBYYnxDlv39XFXw6PZQpDSMpGb1sw01sdCwJWIgj2z8FD+ogEAPfB dpqveK+VpC+f5x36i/Q6hOj9eJij85IT4Wyx0= Received: by 10.231.116.132 with SMTP id m4mr5691212ibq.86.1307972750094; Mon, 13 Jun 2011 06:45:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.199.148 with HTTP; Mon, 13 Jun 2011 06:45:29 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Mon, 13 Jun 2011 14:45:29 +0100 Message-ID: Subject: Re: Modifying Length Normalization calculation To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org This is getting beyond my level of expertise, but I'll have a go at your questions. Hopefully someone better informed will step in with corrections or confirmation. > ... > The application calls the *writer.addDocument(d);* method and in this > process the *lengthNorm(String fieldName, int numTerms)* =A0method is cal= led. > I can extend the *DefaultSimilarity* class and override the > *lengthNorm*method, but how can I explicitly specify the > *numTerms* value? I don't know that you can, but you don't have to use the value passed in. > ... > Does *computeNorm* method is called for every field or is it only called = for > analyzed fields? All indexed fields, at a guess. Which can be analyzed or not. > The order we call *addDocument* and the order the *computeNorm *method is > called is the same ? Probably. > Is there is a possibility that I can access the *Document* object inside = the > *Similiarity* class ? Not that I know of via API calls. If you had your own Similarity implementation, and methods are called in the order you expect, you could add a setDoc(Document) method and/or a setCalcValue(n) method and use them as you wished in your custom computeNorm() or lengthNorm() code. -- Ian. > On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea wrote: > >> org.apache.lucene.search.Similarity would be the place to look, >> specifically computeNorm(String field, FieldInvertState state). =A0There >> is comprehensive info in the javadocs. =A0Note that values are >> calculated at indexing and stored in the index encoded, with some loss >> of precision. >> >> >> -- >> Ian. >> >> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon >> wrote: >> > Hi All, >> > >> > I want to change the length normalization calculation specific to my >> > application. By changing the "*number of terms*" according to my >> > requirement. The "*StandardTokenizer*" works perfectly for my >> application, >> > However, the *number of terms* calculated by the tokenizer is not the >> > effective number of terms for the application. I have an mechanism to >> > calculate that value and I need to know how can I apply that value in >> length >> > normalization calculations. >> > >> > Please advice. >> > >> > Thank you, >> > >> > Best Regards, >> > Lahiru. >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org