lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <>
Subject Re: min/max, StatsComponent, performance
Date Wed, 04 Aug 2010 00:51:39 GMT

Chris Hostetter wrote:
> Honestly: if you have a really small cardinality for these numeric 
> values (ie: small enough to return every value on every request) perhaps 
> you should use faceting to find the min/max values (with facet.mincount=1) 
> instead of starts?
Thanks for the tips and info.

I can't figure out any way to use faceting to find min/max values. If I 
do a facet.sort=index, and facet.limit=1, then the facet value returned 
would be the min value... but how could I get the max value?  There is 
no facet.sort=rindex or what have you.  Ah, you say small enough to 
return every value on every request. Nope, it's not THAT small.  I've 
got about 3 million documents, and 2-10k unique integers in a field, and 
I want to find the min/max.

I guess, if I both index and store the field (which I guess i have to do 
anyway), I can find min and max via two separate queries. Sort by 
my_field asc, sort by my_field desc, with rows=1 both times, get out the 
stored field, that's my min/max.

That might be what I resort to. But it's a shame, StatsComponent can 
give me the info "included" in the query I'm already making, as opposed 
to requiring two additional querries on top of that -- which you'd think 
would be _slower_, but doesn't in fact seem to be.

> I don't think so .. i belive Ryan considered this when he firsted added 
> StatsComponent, but he decided it wasn't really worth the trouble -- all 
> of the stats are computed in a single pass, and the majority of the time 
> is spent getting the value of every doc in the set -- adding each value to 
> a running total (for the sum and ultimatley computing the median) is a 
> really cheap operation compared to the actaul iteration over the set.
Yeah, it's really kind of a mystery to me why StatsComponent is being so 
slow. StatsComponent is slower than faceting on the field, and is even 
slower than the total time of: 1) First making the initial query, 
filling all caches, 2) Then making two additional querries with the same 
q/fq, but with different sorts to get min and max from the result set in 

 From what you say, there's no good reason for StatsComponent to be 
slower than these alternatives, but it is, by an order of magnitude (1-2 
seconds vs 10-15 seconds).

I guess I'd have to get into Java profiling/debugging to figure it out, 
maybe a weird bug or mis-design somewhere I'm tripping.


View raw message