lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2454) Nested Document query support
Date Tue, 21 Jun 2011 10:34:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052459#comment-13052459
] 

Michael McCandless commented on LUCENE-2454:
--------------------------------------------

{quote}
bq. It uses 2 passes if you also want to collect child docs per parent

I tend to work with distributed indexes so it involves a 2 pass op anyway - one to understand
best parents across the multiple shards first then the perparentlimitedquery to ensure we
only pay the retrieve costs for those parents that make the final cut.
{quote}

The distributed case can still be done single pass, using LUCENE-3171,
ie each shard returns the top groups and then they are merged in the
front.  This should be substantially faster than doing a 2nd pass out
to all shards.

Also, we now have TopDocs.merge/TopGroups.merge to support this use
case.

bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be closed
as duplicate of that one. Which one is preferred?

I think they are likely dups of one another and I agree we need to
make sure all important use cases are covered.

bq. Apps commonly need to return a selection of both matching and non-matching children along
with the "best" parents.

LUCENE-3171 can do this as well, with the same approach as here, ie
doing 2 passes with two different child queries.

However, I think for both this issue and for LUCENE-3171, this means
each child doc must have the parent's PK indexed against it, right?
Ie, for that 2nd query you need some way to return all child docs
under any of the top parents, so the child query is "parentID MUST be
in XX, YY, ZZ" and "childDoc SHOULD XYZ".

In fact, we could make this a single pass capability with LUCENE-3171
and without requireing each child doc index its parent PK, ie also
pull & sort all other non-matching children under any top parent,
because collction within each parent is done when you retrieve the
TopGroups, but this can be a later enhancement.


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message