lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kdev <v.verro...@di.uoa.gr>
Subject Re: Scoring formula - Average number of terms in IDF
Date Thu, 17 Dec 2009 15:50:42 GMT

If I follow your approach, and produce the avg(outside of Lucene) while I 'm
building the index(due to performance reasons I can't wait for all the
documents to arrive before indexing them) for a collection, the avg will be
ready only when all the documents of the collection are indexed. 
Lucene states that the new similarity class must be set in
IndexWriter.setSimilarity(), and be used while I build the index, and in
this time the avg isn't ready yet. Is there a way to overcome this? And if
not calculating the score while the index is being created, and only when
searching the index, what will the consequence in performance be?

(Mike thank you about your response)  


Michael McCandless-2 wrote:
> 
> There have been some discussions, here:
> 
>     https://issues.apache.org/jira/browse/LUCENE-2091
> 
> about how Lucene could track avg field/doc length, but they are just
> brainstorming type discussions now.
> 
> You could always do something approximate outside of Lucene?  EG, make
> a TokenFilter that counts how many tokens are produced for each
> field/doc, aggregate & store that yourself, and use it in your
> similarity impl?
> 
> Mike
> 
> On Tue, Dec 15, 2009 at 5:04 AM, kdev <v.verroios@di.uoa.gr> wrote:
>>
>> any ideas please?
>> --
>> View this message in context:
>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

-- 
View this message in context: http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26830145.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message