Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates
 209.85.214.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=e0/kk9ReNUvfJWeGnAxQ+SNTXA9WRk2Ao+g4AoaXsLtSRhiPt0iE4Qgk5gSdr3vQ29
         H8+YWhNdHZctogIVBYYnxDlv39XFXw6PZQpDSMpGb1sw01sdCwJWIgj2z8FD+ogEAPfB
         dpqveK+VpC+f5x36i/Q6hOj9eJij85IT4Wyx0=
MIME-Version: 1.0
In-Reply-To: <BANLkTi==nKaAk5pntu7s=gXqeDQ_uxurrA@mail.gmail.com>
References: <BANLkTimx2LkWXvyEDJyvzL4QUd6Mno=sDw@mail.gmail.com>
 <BANLkTinTtCbeDT_ZnNqCQnPq0-kjxCMJaw@mail.gmail.com>
 <BANLkTi==nKaAk5pntu7s=gXqeDQ_uxurrA@mail.gmail.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Mon, 13 Jun 2011 14:45:29 +0100
Message-ID: <BANLkTikYvdsvRkuWEg4XxnMX8ZwaEuPTUQ@mail.gmail.com>
Subject: Re: Modifying Length Normalization calculation
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This is getting beyond my level of expertise, but I'll have a go at
your questions.  Hopefully someone better informed will step in with
corrections or confirmation.

> ...
> The application calls the *writer.addDocument(d);* method and in this
> process the *lengthNorm(String fieldName, int numTerms)* =A0method is cal=
led.
> I can extend the *DefaultSimilarity* class and override the
> *lengthNorm*method, but how can I explicitly specify the
> *numTerms* value?

I don't know that you can, but you don't have to use the value passed in.

> ...
> Does *computeNorm* method is called for every field or is it only called =
for
> analyzed fields?

All indexed fields, at a guess.  Which can be analyzed or not.

> The order we call *addDocument* and the order the *computeNorm *method is
> called is the same ?

Probably.

> Is there is a possibility that I can access the *Document* object inside =
the
> *Similiarity* class ?

Not that I know of via API calls. If you had your own Similarity
implementation, and methods are called in the order you expect, you
could add a setDoc(Document) method and/or a setCalcValue(n) method
and use them as you wished in your custom computeNorm() or
lengthNorm() code.


--
Ian.


> On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> org.apache.lucene.search.Similarity would be the place to look,
>> specifically computeNorm(String field, FieldInvertState state). =A0There
>> is comprehensive info in the javadocs. =A0Note that values are
>> calculated at indexing and stored in the index encoded, with some loss
>> of precision.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <lahiruts@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I want to change the length normalization calculation specific to my
>> > application. By changing the "*number of terms*" according to my
>> > requirement. The "*StandardTokenizer*" works perfectly for my
>> application,
>> > However, the *number of terms* calculated by the tokenizer is not the
>> > effective number of terms for the application. I have an mechanism to
>> > calculate that value and I need to know how can I apply that value in
>> length
>> > normalization calculations.
>> >
>> > Please advice.
>> >
>> > Thank you,
>> >
>> > Best Regards,
>> > Lahiru.
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org