lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Single call for distributed IDF?
Date Tue, 24 Jan 2017 19:01:27 GMT
Specifically, I’m talking about this:

    <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/>

Adding that line increased our 95th percentile response time by 10 seconds.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 24, 2017, at 10:43 AM, Joel Bernstein <joelsolr@gmail.com> wrote:
> 
> Ah, I thought you were just interested in a fast way to get at IDF. This
> approach does take a callback but it's really fast.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wunder@wunderwood.org>
> wrote:
> 
>> I know how to do it. You return df for each term and num_docs then
>> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years
>> ago, though with MonkeyRank instead of global IDF.
>> 
>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>> 
>> I was wondering why Solr makes a separate request to each shard for that
>> information instead of piggybacking it on the original request.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joelsolr@gmail.com> wrote:
>>> 
>>> This may help out:
>>> https://github.com/apache/lucene-solr/blob/master/solr/
>> solrj/src/java/org/apache/solr/client/solrj/io/stream/
>> ScoreNodesStream.java#L208
>>> 
>>> This points to some code that calculates global idf for a list of terms.
>>> Not sure if this matches you use case. It seems to be very fast.
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wunder@wunderwood.org
>>> 
>>> wrote:
>>> 
>>>> I tried running with the LRUStatsCache for global IDF, but the
>> performance
>>>> penalty was pretty big. The 95th percentile response time went from 3.4
>>>> seconds to 13 seconds. Oops.
>>>> 
>>>> We should not need a separate call to get the tf and df stats. Those are
>>>> already calculated when doing the first request. I worked on a search
>>>> engine that did it that way twenty years ago.
>>>> 
>>>> In the past, there would have been an IP obstacle, but I think that is
>>>> resolved.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message