accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Slacum (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1682) Iterator and example to support intersection of document-partitioned index terms by ranges with lower and upper bounds.
Date Wed, 04 Sep 2013 13:03:52 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757724#comment-13757724
] 

William Slacum commented on ACCUMULO-1682:
------------------------------------------

The real issue is that you can't do a sorted merge join on unsorted data. Since the document
IDs are the last part of the index entry in the key structure, they will be unsorted when
you read across multiple terms. I believe Adam attempted to resolve this by using an isolated
scanner with the combination of consuming the entire range into a sorted map and using that
map as a leaf node/term source.

There are a couple of strategies you can try for this, one of which is to have a temporary
store, like an HDFS backed sorted set or using a local KV store like LevelDB (I was partial
to HawtDB, that you can write the range data out to and then read from. This is similar to
the map idea but not constrained by memory.

You could also do a composite index using a space filling curve and use a predicate or multiple
ranges to cull extraneous data.
                
> Iterator and example to support intersection of document-partitioned index terms by ranges
with lower and upper bounds.
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1682
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1682
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Corey J. Nolet
>            Priority: Minor
>              Labels: proposal
>
> The current IntersectingIterator seeks to discrete terms that are encoded into the column
families to find all column qualifiers that share all of the discrete column families of interest
(with the additional ability to negate some of the column families). Looking at the current
IntersectingIterator code, it should be possible to return all column qualifiers with a column
family within a given range.
> An example of this is finding all terms where NAME=Joe and (AGE>=30 && AGE<60)
and STATE!=MD. If an example is provided, numerical types like the age could easily be encoded
using the new Lexicoders.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message