lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Updated] (LUCENE-3122) Cascaded grouping
Date Thu, 09 May 2013 23:05:50 GMT


Uwe Schindler updated LUCENE-3122:

    Fix Version/s:     (was: 4.3)
> Cascaded grouping
> -----------------
>                 Key: LUCENE-3122
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/grouping
>            Reporter: Michael McCandless
>              Labels: gsoc2013
>             Fix For: 4.4
> Similar to SOLR-2526, in that you are grouping on 2 separate fields, but instead of treating
those fields as a single grouping by a compound key, this change would let you first group
on key1 for the primary groups and then secondarily on key2 within the primary groups.
> Ie, the result you get back would have groups A, B, C (grouped by key1) but then the
documents within group A would be grouped by key 2.
> I think this will be important for apps whose documents are the product of denormalizing,
ie where the Lucene document is really a sub-document of a different identifier field.  Borrowing
an example from LUCENE-3097, you have doctors but each doctor may have multiple offices (addresses)
where they practice and so you index doctor X address as your lucene documents.  In this case,
your "identifier" field (that which "counts" for facets, and should be "grouped" for presentation)
is doctorid.  When you offer users search over this index, you'd likely want to 1) group by
distance (ie, < 0.1 miles, < 0.2 miles, etc., as a function query), but 2) also group
by doctorid, ie cascaded grouping.
> I suspect this would be easier to implement than it sounds: the per-group collector used
by the 2nd pass grouping collector for key1's grouping just needs to be another grouping collector.
 Spookily, though, that collection would also have to be 2-pass, so it could get tricky since
grouping is sort of recursing on itself.... once we have LUCENE-3112, though, that should
enable efficient single pass grouping by the identifier (doctorid).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message