lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2454) Nested Document query support
Date Mon, 20 Jun 2011 20:08:48 GMT


Michael McCandless commented on LUCENE-2454:

bq. Would modules/grouping meanwhile be a better place for this than lucene/contrib/queries?

I think modules/join is the right place?  When we factor out Solr's
generic join impl it can go there too...

I have some concerns about the current approach here (this is why I
opened LUCENE-3171):

  * prevSetBit is called for each child doc, which is an O(N^2) cost
    (N = number of child docs for one parent) I think?  Admittedly,
    "typically" N is probably small...

  * It uses 2 passes if you also want to collect child docs per

  * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new
    child and on popping the child docs per group: I think it should
    use a PQ to find the lowest child to evict per parent doc?

  * I think "typically" an app will want to collect the top N groups
    (parent docs and their children), so it's more efficient to gather
    those top N and only in the end sort the each set of children
    per-parent?  (This is similar to how 2nd pass grouping collector

  * PerParentLimitedQuery only supports relevance sort w/in each

  * You don't get the parent/child structure back, from
    PerParentLimitedQuery (but now we have TopGroups which is a great
    match for representing each parent and its children).

If you always only use PerParentLimitedQuery on the top parents from
the first pass, eg you AND/filter it against those parent docs, then
the O(N^2) cost is less severe since it'll have a small constant in
front, but since it's a Query I imagine users will use it w/o that
filter, which is bad... I think using a TopN Collector is a better match

> Nested Document query support
> -----------------------------
>                 Key: LUCENE-2454
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LUCENE-2454.patch,
> A facility for querying nested documents in a Lucene index as outlined in

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message