lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3972) Improve AllGroupsCollector implementations
Date Thu, 12 Apr 2012 16:57:17 GMT


Martijn van Groningen commented on LUCENE-3972:

bq. If you have fewer unique groups (and as the number of docs collected goes up), I think
the current impl should be faster...?
This is true. I ran a few tests on an index containing 10M docs:
||Run||Num unique groups||Perf. collector in patch||Perf. committed collector|| 
|1|~65000|892 ms|132 ms|
|2|~645000|1183 ms|878 ms|
|3|~953000|1291 ms|1517 ms|
|4|~1819000|1783 ms|3762 ms|
|5|~3332000|2703 ms|4882 ms|
|6|~6668000|4840 ms|18989 ms|

All the times are the average over 10 executions with a match all query.

bq. the time is likely dominated by re-ord'ing for each segment?
During run 4 I noticed that 3470 ms of the total 3762 ms was spend on re-ord'ing groups for

It seems that the implementation in the patch is only usable if a search yields many unique
groups as result.  
> Improve AllGroupsCollector implementations
> ------------------------------------------
>                 Key: LUCENE-3972
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/grouping
>            Reporter: Martijn van Groningen
>         Attachments: LUCENE-3972.patch, LUCENE-3972.patch
> I think that the performance of TermAllGroupsCollectorm, DVAllGroupsCollector.BR and
DVAllGroupsCollector.SortedBR can be improved by using BytesRefHash to store the groups instead
of an ArrayList.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message