Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 740EB6C2A for ; Tue, 14 Jun 2011 05:12:10 +0000 (UTC) Received: (qmail 84263 invoked by uid 500); 14 Jun 2011 05:12:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83869 invoked by uid 500); 14 Jun 2011 05:12:07 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83861 invoked by uid 99); 14 Jun 2011 05:12:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 05:12:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lahiruts@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 05:11:59 +0000 Received: by wwi18 with SMTP id 18so3656868wwi.5 for ; Mon, 13 Jun 2011 22:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references :x-goomoji-body:date:message-id:subject:from:to:content-type; bh=44hiqRMXFL4IZuuARNOG8FoCeWJbZ8WLWgiOixZXoPY=; b=DP3YcRtm7W7go3rQ7tGQ5zvQB2odueth7mSIjsYZRanuHMR89BjZ6NinEi5zOq+M1k 2PFIDrTbfRn/VMeW02v8B/8QPnB0m5BBcLejMpRoSJHi9m8WcVDPzarEfnU3QL+APL34 NL5YZ/rabMdhvFgH51rr/P7ZhYyAbW9maqtws= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:x-goomoji-body:date:message-id :subject:from:to:content-type; b=XZVdxfCyhYEOtiWdMU8WYmwh9BqoexM0qe1dVUEcIjz33PhWIHAEm0de/TUnv3loso 7BNeyj2eE/AHhh5iAcIIZpBbY5W+I9yltMAb2cDaJSUElRUc4YJFKLWkWiTGxKedDjDA rzTjVFAAOE8SdvaBrN1RP6DWA9rQDAhH6d84g= MIME-Version: 1.0 Received: by 10.216.240.202 with SMTP id e52mr270064wer.84.1308028297686; Mon, 13 Jun 2011 22:11:37 -0700 (PDT) Received: by 10.216.157.75 with HTTP; Mon, 13 Jun 2011 22:11:37 -0700 (PDT) In-Reply-To: References: X-Goomoji-Body: true Date: Tue, 14 Jun 2011 10:41:37 +0530 Message-ID: Subject: Re: Modifying Length Normalization calculation From: Lahiru Samarakoon To: java-user@lucene.apache.org Content-Type: multipart/related; boundary=e0cb4e38500c0a30f804a5a51292 --e0cb4e38500c0a30f804a5a51292 Content-Type: multipart/alternative; boundary=e0cb4e38500c0a30f504a5a51291 --e0cb4e38500c0a30f504a5a51291 Content-Type: text/plain; charset=ISO-8859-1 Hi Ian, The order is right and your method is working for me. Thanks [?] Lahiru On Mon, Jun 13, 2011 at 7:15 PM, Ian Lea wrote: > This is getting beyond my level of expertise, but I'll have a go at > your questions. Hopefully someone better informed will step in with > corrections or confirmation. > > > ... > > The application calls the *writer.addDocument(d);* method and in this > > process the *lengthNorm(String fieldName, int numTerms)* method is > called. > > I can extend the *DefaultSimilarity* class and override the > > *lengthNorm*method, but how can I explicitly specify the > > *numTerms* value? > > I don't know that you can, but you don't have to use the value passed in. > > > ... > > Does *computeNorm* method is called for every field or is it only called > for > > analyzed fields? > > All indexed fields, at a guess. Which can be analyzed or not. > > > The order we call *addDocument* and the order the *computeNorm *method is > > called is the same ? > > Probably. > > > Is there is a possibility that I can access the *Document* object inside > the > > *Similiarity* class ? > > Not that I know of via API calls. If you had your own Similarity > implementation, and methods are called in the order you expect, you > could add a setDoc(Document) method and/or a setCalcValue(n) method > and use them as you wished in your custom computeNorm() or > lengthNorm() code. > > > -- > Ian. > > > > On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea wrote: > > > >> org.apache.lucene.search.Similarity would be the place to look, > >> specifically computeNorm(String field, FieldInvertState state). There > >> is comprehensive info in the javadocs. Note that values are > >> calculated at indexing and stored in the index encoded, with some loss > >> of precision. > >> > >> > >> -- > >> Ian. > >> > >> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon > >> wrote: > >> > Hi All, > >> > > >> > I want to change the length normalization calculation specific to my > >> > application. By changing the "*number of terms*" according to my > >> > requirement. The "*StandardTokenizer*" works perfectly for my > >> application, > >> > However, the *number of terms* calculated by the tokenizer is not the > >> > effective number of terms for the application. I have an mechanism to > >> > calculate that value and I need to know how can I apply that value in > >> length > >> > normalization calculations. > >> > > >> > Please advice. > >> > > >> > Thank you, > >> > > >> > Best Regards, > >> > Lahiru. > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --e0cb4e38500c0a30f504a5a51291 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Ian,

The order is right and your method is working for me.
Thanks=A0

Lahiru

On Mon, Jun 13, 2011 at 7:15 PM, Ian Lea <ian.lea@gmail.com> wrote:
This is getting beyond my level of expertise, but I'll have a go at
your questions. =A0Hopefully someone better informed will step in with
corrections or confirmation.

> ...
> The application calls the *writer.addDocument(d);* m= ethod and in this
> process the *lengthNorm(String fieldName, int numTerms)* =A0method is = called.
> I can extend the *DefaultSimilarity* class and override the
> *lengthNorm*method, but how can I explicitly specify the
> *numTerms* value?

I don't know that you can, but you don't have to use the valu= e passed in.

> ...
> Does *computeNorm* method is called for every field = or is it only called for
> analyzed fields?

All indexed fields, at a guess. =A0Which can be analyzed or not.

> The order we call *addDocument* and the order the *computeNorm *method= is
> called is the same ?

Probably.

> Is there is a possibility that I can access the *Document* object insi= de the
> *Similiarity* class ?

Not that I know of via API calls. If you had your own Similarity
implementation, and methods are called in the order you expect, you
could add a setDoc(Document) method and/or a setCalcValue(n) method
and use them as you wished in your custom computeNorm() or
lengthNorm() code.


--
Ian.


> On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> org.apache.lucene.search.Similarity would be the place to look, >> specifically computeNorm(String field, FieldInvertState state). = =A0There
>> is comprehensive info in the javadocs. =A0Note that values are
>> calculated at indexing and stored in the index encoded, with some = loss
>> of precision.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <lahiruts@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I want to change the length normalization calculation specifi= c to my
>> > application. By changing the "*number of terms*" ac= cording to my
>> > requirement. The "*StandardTokenizer*" works perfec= tly for my
>> application,
>> > However, the *number of terms* calculated by the tokenizer is= not the
>> > effective number of terms for the application. I have an mech= anism to
>> > calculate that value and I need to know how can I apply that = value in
>> length
>> > normalization calculations.
>> >
>> > Please advice.
>> >
>> > Thank you,
>> >
>> > Best Regards,
>> > Lahiru.
>> >
>>
>> ------------------------------------------------------------------= ---
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


--e0cb4e38500c0a30f504a5a51291-- --e0cb4e38500c0a30f804a5a51292--