lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (LUCENE-3171) BlockJoinQuery/Collector
Date Tue, 21 Jun 2011 19:21:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052770#comment-13052770
] 

Paul Elschot edited comment on LUCENE-3171 at 6/21/11 7:20 PM:
---------------------------------------------------------------

The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet.

Another implementation (should be another issue, but since you asked...) could be a set of
increasing integers, based on a balanced tree structure with a moderate fanout (e.g. 32),
and all integer values relative to the minimum determined by the data for the pointer from
the parent. The whole thing could be stored in one int[], the pointers would be (forward)
indexes into this one array, and each internal node would consist of two rows of integers
(one data, one pointers), and each row would be compressed as a frame of reference into the
array.

This thing can implement {code}int next(int x){code} and {code}int previous(int x){code} easily,
and an iterator over this can implement {code}advance(target){code} for a DocIdSetIterator,
and because of the symmetry it can also do that in the reverse direction as needed here.
Compression at higher levels might not be necessary.

For now, there is no code for this, except for the frame of reference.
Occasionaly the need for a more space efficient filter shows up on the mailing lists, so if
anyone wants to give this a try...



      was (Author: paul.elschot@xs4all.nl):
    The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet.

Another implementation (should be another issue, but since you asked...) could be a set of
increasing integers, based on a balanced tree structure with a moderate fanout (e.g. 32),
and all integer values relative to the minimum determined by the data for the pointer from
the parent. The whole thing could be stored in one int[], the pointers would be (forward)
indexes into this one array, and each internal node would consist of two rows of integers
(one data, one pointers), and each row would be compressed as a frame of reference into the
array.

This thing can implement {code}int next(int x){code} and {code}int previous(int x){code} easily,
and an iterator over this can implement {code}advance(target){code} for a DocIdSetIterator,
and because of the symmetry it can also do that in the reverse direction as needed here.
Compression at higher levels might not be necessary.

For now, there is code for this, except for the frame of reference.
Occasionaly the need for a more space efficient filter shows up on the mailing lists, so if
anyone want to give this a try...


  
> BlockJoinQuery/Collector
> ------------------------
>
>                 Key: LUCENE-3171
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3171
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch
>
>
> I created a single-pass Query + Collector to implement nested docs.
> The approach is similar to LUCENE-2454, in that the app must index
> documents in "join order", as a block (IW.add/updateDocuments), with
> the parent doc at the end of the block, except that this impl is one
> pass.
> Once you join at indexing time, you can take any query that matches
> child docs and join it up to the parent docID space, using
> BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
> docs by provided Sort, to gather results, grouped by parent; this
> collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
> retains the child docs corresponding to each collected parent doc.
> After searching is done, you retrieve the TopGroups from a provided
> BlockJoinQuery.
> Like LUCENE-2454, this is less general than the arbitrary joins in
> Solr (SOLR-2272) or parent/child from ElasticSearch
> (https://github.com/elasticsearch/elasticsearch/issues/553), since you
> must do the join at indexing time as a doc block, but it should be
> able to handle nested joins as well as joins to multiple tables,
> though I don't yet have test cases for these.
> I put this in a new Join module (modules/join); I think as we
> refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message