lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: min/max, StatsComponent, performance
Date Wed, 04 Aug 2010 00:51:39 GMT


Chris Hostetter wrote:
> Honestly: if you have a really small cardinality for these numeric 
> values (ie: small enough to return every value on every request) perhaps 
> you should use faceting to find the min/max values (with facet.mincount=1) 
> instead of starts?
>   
Thanks for the tips and info.

I can't figure out any way to use faceting to find min/max values. If I 
do a facet.sort=index, and facet.limit=1, then the facet value returned 
would be the min value... but how could I get the max value?  There is 
no facet.sort=rindex or what have you.  Ah, you say small enough to 
return every value on every request. Nope, it's not THAT small.  I've 
got about 3 million documents, and 2-10k unique integers in a field, and 
I want to find the min/max.

I guess, if I both index and store the field (which I guess i have to do 
anyway), I can find min and max via two separate queries. Sort by 
my_field asc, sort by my_field desc, with rows=1 both times, get out the 
stored field, that's my min/max.

That might be what I resort to. But it's a shame, StatsComponent can 
give me the info "included" in the query I'm already making, as opposed 
to requiring two additional querries on top of that -- which you'd think 
would be _slower_, but doesn't in fact seem to be.


> I don't think so .. i belive Ryan considered this when he firsted added 
> StatsComponent, but he decided it wasn't really worth the trouble -- all 
> of the stats are computed in a single pass, and the majority of the time 
> is spent getting the value of every doc in the set -- adding each value to 
> a running total (for the sum and ultimatley computing the median) is a 
> really cheap operation compared to the actaul iteration over the set.
>   
Yeah, it's really kind of a mystery to me why StatsComponent is being so 
slow. StatsComponent is slower than faceting on the field, and is even 
slower than the total time of: 1) First making the initial query, 
filling all caches, 2) Then making two additional querries with the same 
q/fq, but with different sorts to get min and max from the result set in 
#1.

 From what you say, there's no good reason for StatsComponent to be 
slower than these alternatives, but it is, by an order of magnitude (1-2 
seconds vs 10-15 seconds).

I guess I'd have to get into Java profiling/debugging to figure it out, 
maybe a weird bug or mis-design somewhere I'm tripping.

Konathan

Mime
View raw message