lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector
Date Tue, 21 Jun 2011 16:19:47 GMT


Michael McCandless commented on LUCENE-3171:

bq. BlockJoinQuery still needs hashCode/equals

Woops, thanks, I'll add!

and a javadoc note (as I remarked earlier at 2454) about the possible inefficiency of the
use of OpenBitSet for larger group sizes. When the typical group size gets a lot bigger than
the number of bits in a long, another implementation might be faster. This remark the in javadocs
would allow us to wait for someone to come along with bigger group sizes and a real performance
problem here.

Hmm: do you have an improvement in mind for OpenBitSet.prevSetBit to better handle large groups?
 Or, where is this possible inefficiency (is it something specific)?

bq. I would prefer to use single pass and for now I only need the parent docs. That means
that I have no preference for 2454 or this one.

I wonder how often apps "typically" need just the parent docs vs the groups (w/ child docs)...

But, still this patch only calls .nextSetBit() once per group so that ought to be faster than
LUCENE-2454, I think... hmm, unless you typically only have 1 child match per parent.

> BlockJoinQuery/Collector
> ------------------------
>                 Key: LUCENE-3171
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Michael McCandless
>             Fix For: 3.3, 4.0
>         Attachments: LUCENE-3171.patch, LUCENE-3171.patch
> I created a single-pass Query + Collector to implement nested docs.
> The approach is similar to LUCENE-2454, in that the app must index
> documents in "join order", as a block (IW.add/updateDocuments), with
> the parent doc at the end of the block, except that this impl is one
> pass.
> Once you join at indexing time, you can take any query that matches
> child docs and join it up to the parent docID space, using
> BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
> docs by provided Sort, to gather results, grouped by parent; this
> collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
> retains the child docs corresponding to each collected parent doc.
> After searching is done, you retrieve the TopGroups from a provided
> BlockJoinQuery.
> Like LUCENE-2454, this is less general than the arbitrary joins in
> Solr (SOLR-2272) or parent/child from ElasticSearch
> (, since you
> must do the join at indexing time as a doc block, but it should be
> able to handle nested joins as well as joins to multiple tables,
> though I don't yet have test cases for these.
> I put this in a new Join module (modules/join); I think as we
> refactor join impls we should put them here.

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message