accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corey J. Nolet (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-1682) Iterator and example to support intersection of document-partitioned index terms by ranges with lower and upper bounds.
Date Fri, 06 Sep 2013 01:56:51 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759752#comment-13759752
] 

Corey J. Nolet edited comment on ACCUMULO-1682 at 9/6/13 1:55 AM:
------------------------------------------------------------------

I've been playing around with swapping out the source iterator underneath the intersecting
iterator for things that I want to query in ranges. I'm wondering if it would work to take
your composite index approach and actually have the source iterator modify the key (to always
be the lower bound of the range query if in fact the key is within the query range) before
the key ever makes it to the intersecting iterator. 

In other words, let's say the term in question is AGE=>30 && AGE<=40. The intersecting
iterator only needs to know that I care if AGE=30 and should tell each source iterator to
begin there. If the source iterator knows about the range query AGE>=30 && AGE<=40,
it can continue to iterate the keys that match but modify them to AGE=30 when getTopKey()
is called. This way the intersecting iterator sees what it asked for and when the full document
is retrieved, the actual value is pulled out.

Excuse my ignorance if this is a big nono. I know that certain optimizations performed on
scans make it very dangerous to modify keys. I can't foresee any conflicts with the order
of the keys in the source iterator since I'm always rounding down and I'm scanning through
increasing keys to start with. I'm assuming this has already been tried and doesn't work.
Am I correct?

UPDATE: Nevermind- I see better your original case about column qualifiers not being able
to be returned in order unless they are somehow sorted. It works when you are querying a discrete
value since the column qualifiers will be sorted already. Skipping through multiple different
column families causes this not to be the case.
                
      was (Author: sonixbp):
    I've been playing around with swapping out the source iterator underneath the intersecting
iterator for things that I want to query in ranges. I'm wondering if it would work to take
your composite index approach and actually have the source iterator modify the key (to always
be the lower bound of the range query if in fact the key is within the query range) before
the key ever makes it to the intersecting iterator. 

In other words, let's say the term in question is AGE=>30 && AGE<=40. The intersecting
iterator only needs to know that I care if AGE=30 and should tell each source iterator to
begin there. If the source iterator knows about the range query AGE>=30 && AGE<=40,
it can continue to iterate the keys that match but modify them to AGE=30 when getTopKey()
is called. This way the intersecting iterator sees what it asked for and when the full document
is retrieved, the actual value is pulled out.

Excuse my ignorance if this is a big nono. I know that certain optimizations performed on
scans make it very dangerous to modify keys. I can't foresee any conflicts with the order
of the keys in the source iterator since I'm always rounding down and I'm scanning through
increasing keys to start with. I'm assuming this has already been tried and doesn't work.
Am I correct?
                  
> Iterator and example to support intersection of document-partitioned index terms by ranges
with lower and upper bounds.
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1682
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1682
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Corey J. Nolet
>            Priority: Minor
>              Labels: proposal
>
> The current IntersectingIterator seeks to discrete terms that are encoded into the column
families to find all column qualifiers that share all of the discrete column families of interest
(with the additional ability to negate some of the column families). Looking at the current
IntersectingIterator code, it should be possible to return all column qualifiers with a column
family within a given range.
> An example of this is finding all terms where NAME=Joe and (AGE>=30 && AGE<60)
and STATE!=MD. If an example is provided, numerical types like the age could easily be encoded
using the new Lexicoders.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message