lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2454) Nested Document query support
Date Thu, 26 May 2011 10:17:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039623#comment-13039623
] 

Michael McCandless commented on LUCENE-2454:
--------------------------------------------

bq. I'll need to check LUCENE-3129 for equivalence with PerParentLimitQuery. It's certainly
a central part of what I typically deploy for nested queries - pass 1 is usually a NestedDocumentQuery
to get the best parents and pass 2 uses PerParentLimitQuery to get the best children for these
best parents.

Hmm, so I wonder if we could do this in one pass?  Ie, like grouping,
if you indexed your docs as blocks, you can use the faster single-pass
collector; but if you didn't, you can use the more general but slower
and more-RAM-consuming two pass collector.

It seems like we should be able to do something similar with joins,
somehow... ie Solr's join impl is a start at the "fully general"
two-pass solution.

But I agree the "join child to parent" and then "grouping of child
docs" go hand in hand for searching...

What do you do for facet counting in these apps...?  Post-grouping
faceting also ties in here.

bq. Of course some apps can simply fetch ALL children for the top parents but in some cases
summarising children is required

Right...

bq.  (note: this is potentially a great solution for performance issues on highlighting big
docs e.g. entire books).

I think it'd be compelling to index book/articles with each
page/section/chapter being a new doc, and then group them under their
book/article.

bq. I haven't benchmarked nextSetBit vs the existing "rewind" implementation but I imagine
it may be quicker.

I think it should be much faster -- obs.nextSetBit looks heavily
optimized, since it can operate a word at a time.  Though, if the
groups are smallish, so that nextSetBit is often maybe 2 or 3 bits
away, I'm not sure it'd be faster...

bq. Parent- followed-by-children seems more natural from a user's point of view however.

But is it really so bad to ask the app to put parent doc last?

I mean, the docs have to be indexed w/ the new doc block APIs in IW
anyway, which will often be eg a List<Document>, at which point
putting parent last seems a minor imposition?

Since this is an expert API I think it's OK to put [minor] impositions
on its usage if this can simplify the impl / make it faster / less
risky.  That said, I'm not yet sure on the impl (single pass query +
collector vs generic two-pass join that solr now has), so it's
probably premature to worry about this...

bq. I guess you could always keep the parent-then-child insertion order but flip the bitset
(then cache) for query execution if that was faster.

True but this adds some hair into the impl (we must also "flip" coming
back from nextSetBit)...

bq. Benchmarking rewind vs nextSetbit vs flip then nextSetBit would reveal all.

True, though it'd be best to do this in the context of the actual join impl...


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message