lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <>
Subject Re: categorizing results
Date Sun, 18 Mar 2007 09:28:14 GMT
Le dimanche 18 mars 2007 06:55, Dima May a écrit :
>  <>
> I have a Lucene related questions/problem.
> My search results can potentially get very large 200,000+. I want to
> categorize my results. So for example if I have an indexed field "type"
> that has such things as CDs, books, videos, power drills, or anything else
> in the world, I would want to display how many results are found for each
> unique "type". So I may get 200 CDs and 3000 books and so on.
> The brute-force way of doing that is traversing the returned
> documents/results and counting each unique type. Unfortunately this
> solution is too slow. I am having trouble figuring out a faster approach
> and was hoping someone can suggest one.

The solution I use is using filters. So before doing the search, you have to 
already compute a pool a filters, one filter for each category.
Then use the search(Query, HitCollector), and use a collector which will 
dispatch in different queues the results, depending of which filter matched. 
The collect function of the collector should look like that :

        public void collect(int doc, float score) {
            for (int i = 0; i < filters.length; i++) {
                if (filters[i].get(doc)) {
                    DocScoreCategory docScoreCategory = new 
DocScoreCategory(doc, score, i);

It is nearly as fast as a normal search but has the background that some 
filters have to be computed, and it can take some time. In our application 
updates are done not so often, so we can generate them in background. With an 
index permanently updated this will be very time consuming.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message