lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Modifying Length Normalization calculation
Date Mon, 13 Jun 2011 13:45:29 GMT
This is getting beyond my level of expertise, but I'll have a go at
your questions.  Hopefully someone better informed will step in with
corrections or confirmation.

> ...
> The application calls the *writer.addDocument(d);* method and in this
> process the *lengthNorm(String fieldName, int numTerms)*  method is called.
> I can extend the *DefaultSimilarity* class and override the
> *lengthNorm*method, but how can I explicitly specify the
> *numTerms* value?

I don't know that you can, but you don't have to use the value passed in.

> ...
> Does *computeNorm* method is called for every field or is it only called for
> analyzed fields?

All indexed fields, at a guess.  Which can be analyzed or not.

> The order we call *addDocument* and the order the *computeNorm *method is
> called is the same ?

Probably.

> Is there is a possibility that I can access the *Document* object inside the
> *Similiarity* class ?

Not that I know of via API calls. If you had your own Similarity
implementation, and methods are called in the order you expect, you
could add a setDoc(Document) method and/or a setCalcValue(n) method
and use them as you wished in your custom computeNorm() or
lengthNorm() code.


--
Ian.


> On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> org.apache.lucene.search.Similarity would be the place to look,
>> specifically computeNorm(String field, FieldInvertState state).  There
>> is comprehensive info in the javadocs.  Note that values are
>> calculated at indexing and stored in the index encoded, with some loss
>> of precision.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <lahiruts@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I want to change the length normalization calculation specific to my
>> > application. By changing the "*number of terms*" according to my
>> > requirement. The "*StandardTokenizer*" works perfectly for my
>> application,
>> > However, the *number of terms* calculated by the tokenizer is not the
>> > effective number of terms for the application. I have an mechanism to
>> > calculate that value and I need to know how can I apply that value in
>> length
>> > normalization calculations.
>> >
>> > Please advice.
>> >
>> > Thank you,
>> >
>> > Best Regards,
>> > Lahiru.
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message