Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D2B6DF63 for ; Tue, 19 Jun 2012 09:40:07 +0000 (UTC) Received: (qmail 69955 invoked by uid 500); 19 Jun 2012 09:40:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 69679 invoked by uid 500); 19 Jun 2012 09:40:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 69633 invoked by uid 99); 19 Jun 2012 09:40:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 09:40:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kasunp@opensource.lk designates 209.85.210.48 as permitted sender) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 09:39:55 +0000 Received: by dadz8 with SMTP id z8so9873542dad.35 for ; Tue, 19 Jun 2012 02:39:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=qC7cXCuOgJnYnnEb/Nqta5jXUeLwWLDHv4/o/6vFxIs=; b=XHjI6Z4x2en7iGDC9sFGa15CQ/yaGMZMxoc80ewCP4GrweGkw1pBBjDoHX2qpwF6d6 /vq1vDXEwVkesZT5hFOMf3dBXmVvwm+u2rEsJSt9vgh1Zklk9OZSbisIK19x6TDIG1Rw lHxY1+xBx2G92RS2zUdsCFui+SbtUR0B8h3olzwvzYPs3GPJJ7rVOVuz8Stvb4HhKcnW 1KrRBZ1sscV2zTCuddqPbre8+5I8W/ovXfkgTiaRyO5e/I6Bg7hJe+vdrgNl/scD9uXX CJcUlzt0ouTn3lMQmFfLIy3DUFQ99K42k5lo/aRhws8dTkcOREcaqt/vJULVo7QlyiLd wf0Q== Received: by 10.68.223.198 with SMTP id qw6mr45637602pbc.94.1340098774421; Tue, 19 Jun 2012 02:39:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.217.10 with HTTP; Tue, 19 Jun 2012 02:39:14 -0700 (PDT) In-Reply-To: References: From: Kasun Perera Date: Tue, 19 Jun 2012 15:09:14 +0530 Message-ID: Subject: Re: Calculating Average Document Length with Lucene To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b1601b969cf3504c2d00f47 X-Gm-Message-State: ALoCoQnP88BkRCi+FoNpt51GgBScWbqbZEzugdV7MiVBvZFs87Bk2eHIE1sWwojW44F/B4CQ6Y1o X-Virus-Checked: Checked by ClamAV on apache.org --047d7b1601b969cf3504c2d00f47 Content-Type: text/plain; charset=ISO-8859-1 I found this is the correct way of calculating Average Document length of document having tree fields byte[] normsDocLengthArrField1 = indexReader.norms("filed1"); byte[] normsDocLengthArrField2 = indexReader.norms("filed2"); byte[] normsDocLengthArrField3 = indexReader.norms("filed3"); double sumLength = 0; for (int i = 0; i < normsDocLengthArrField1.length; i++) { double encodeLengthFOne = DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]); double encodeLengthFTwo = DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]); double encodeLengthFThree = DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]); //decodeNorm -Decodes a normalization factor stored in an index. double lengthFieldOne = 1 / (encodeLengthFOne * encodeLengthFOne); double lengthFieldTwo = 1 / (encodeLengthFTwo * encodeLengthFTwo); double lengthFieldThree = 1 / (encodeLengthFThree * encodeLengthFThree); sumLength += lengthFieldOne + lengthFieldTwo + lengthFieldThree; } this.avgDocLength = sumLength / (normsDocLengthArrField1.length); Thanks On Mon, Jun 18, 2012 at 8:48 AM, Kasun Perera wrote: > I want to calculate average document length for document collection which > each document having 3 different fields(filed1, field2,field3) > > This is the program to calculate average length when only one field is > there. > > private byte[] normsDocLengthArr = null; > > private double avgDocLength; > > > normsDocLengthArr = indexReader.norms("filed1"); > > > //norms-Returns the byte-encoded normalization factor for the named field of every document. > > double sumLength = 0; > > > for (int i = 0; i < normsDocLengthArr.length; i++) { > > > double encodeLength = DefaultSimilarity.decodeNorm(normsDocLengthArr[i]); > > > //decodeNorm -Decodes a normalization factor stored in an index. > > > double length = 1 / (encodeLength * encodeLength); > > > sumLength += length; > > > } > > > this.avgDocLength = sumLength / normsDocLengthArr.length; > > This is how I extended it for all 3 fields. > > private byte[] normsDocLengthArrField1 = null; > > private byte[] normsDocLengthArrField2 = null; > > private byte[] normsDocLengthArrField3 = null; > > private double avgDocLength; > > > normsDocLengthArrField1 = indexReader.norms("filed1"); > > > normsDocLengthArrField2 = indexReader.norms("filed2"); > > > normsDocLengthArrField3 = indexReader.norms("filed3"); > > > //norms-Returns the byte-encoded normalization factor for the named field of every document. > > double sumLength = 0; > > > for (int i = 0; i < normsDocLengthArrField1.length; i++) { > > > double encodeLengthF1 = DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]); > > > double encodeLengthF2 = DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]); > > > double encodeLengthF3 = DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]); > > > //decodeNorm -Decodes a normalization factor stored in an index. > > > double length = 1 / {(encodeLengthF1 * encodeLengthF1)+(encodeLengthF2 * encodeLengthF2)+(encodeLengthF3 * encodeLengthF3)}; > > > sumLength += length; > > > } > > > this.avgDocLength = sumLength / (normsDocLengthArrField1.length+ normsDocLengthArrField2.length+normsDocLengthArrField3.length; > > I just want to know whether my implementation of calculating Doc average > length for 3 field is correct? > > -- > Regards > > Kasun Perera > > -- Regards Kasun Perera --047d7b1601b969cf3504c2d00f47--