lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2454) Nested Document query support
Date Mon, 20 Jun 2011 20:08:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052194#comment-13052194
] 

Michael McCandless commented on LUCENE-2454:
--------------------------------------------

bq. Would modules/grouping meanwhile be a better place for this than lucene/contrib/queries?

I think modules/join is the right place?  When we factor out Solr's
generic join impl it can go there too...

I have some concerns about the current approach here (this is why I
opened LUCENE-3171):

  * prevSetBit is called for each child doc, which is an O(N^2) cost
    (N = number of child docs for one parent) I think?  Admittedly,
    "typically" N is probably small...

  * It uses 2 passes if you also want to collect child docs per
    parent

  * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new
    child and on popping the child docs per group: I think it should
    use a PQ to find the lowest child to evict per parent doc?

  * I think "typically" an app will want to collect the top N groups
    (parent docs and their children), so it's more efficient to gather
    those top N and only in the end sort the each set of children
    per-parent?  (This is similar to how 2nd pass grouping collector
    works).

  * PerParentLimitedQuery only supports relevance sort w/in each
    parent.

  * You don't get the parent/child structure back, from
    PerParentLimitedQuery (but now we have TopGroups which is a great
    match for representing each parent and its children).

If you always only use PerParentLimitedQuery on the top parents from
the first pass, eg you AND/filter it against those parent docs, then
the O(N^2) cost is less severe since it'll have a small constant in
front, but since it's a Query I imagine users will use it w/o that
filter, which is bad... I think using a TopN Collector is a better match
here.


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message