lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Search Results Clustering
Date Tue, 30 Aug 2005 20:52:07 GMT

: Suppose I cluster the results only on the 1st field i.e. I do not show
: the constituent clusters. Even in this case, i'll require around 900
: Filters[i have 900 unique terms] in memory and will have to run the same
: query 900 times, 1 on each Filter. I am sitting at a situation where I
: get around 15 queries/sec on an average. Even if I spare another machine
: to return me the clustering results, I'll be firing 15*90 = 1350
: queries/sec.

1) If I remember correctly, just because query X takes S seconds to
complete, doesn't mean issuing executing X N times in rapid succession
will take N * S seconds.  There is some internal caching taking place.

2) Regardless of what type of complex Query objects you need to generate
the set of products that match your users search, or the set of products
in each category (what you've been calling cluster), or which search
method you use to generate the main list of results, you can use the
QueryFilter class to translate that Query into a Filter, and call
Filter.bits to get a BitSet for each categories Filter (and for your
main result set) which can be intersected to find the counts you need.
These BitSets can be cached for as long as your index remains unmodified
-- using CachingWrapperFilter for example, which means the work required
to do those 900 (category specific) queries only happens once each time
the index is changed -- not once per user search.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message