accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <>
Subject Re: Iterators returning keys out of scan range
Date Sat, 25 May 2013 17:09:20 GMT
He's talking about using iterators that transform keys (we don't have
any built-in, IIRC), like those that extend the new
TransformingIterator. Scanner logic is written, such that it will
resume scanning from the last key it received. This is important for
handling failures and splits/migrations during a scan. So, in this
context, a "reversible transformation" simply means that when the
client tells the tserver's iterator stack scan, it can transform what
the client thinks is the starting point for the scan, back to what it
actually should have been prior to transformation, so it can resume
from the correct place. This is necessary, because the client will not
know what the data looked like prior to transformation, as it only
sees data returned from the iterator stack.

Now, the assumption here, is that the key that the client *thinks* is
the starting point is in the same tablet that the real starting *is*.
Otherwise, it doesn't matter if the transformation is reversible,
because the real starting point could be on a different tablet
entirely (due to splits). To ensure this doesn't happen, it's
important to make sure that transforming iterators that you implement
do not transform the RowID portion of the key... or else, if they do,
they can send a special key back, that is understood by client code
that can inform the client to query a different tablet server... the
one the client needs to resume scanning from.

Yes, there should be unit tests, but the unit tests would be against
iterators that actually transform keys in this way... and I don't
think we provide any. That'd be user code.

Christopher L Tubbs II

On Sat, May 25, 2013 at 9:36 AM, David Medinets
<> wrote:
> Is there a unit test exposing this behavior? And what does "reversible
> transformation" mean?
> On Wed, May 1, 2013 at 8:36 PM, Adam Fuchs <> wrote:
>> For all the rest of you on this thread, the big problem you'll run into
>> when returning keys out of range is that the reseeking behavior will skip a
>> bunch of underlying keys (i.e. don't try this at home). For example, say you
>> have tablets ["A","D"], ("D","M"], and ("M","ZZZZ..."]. If you do a query on
>> ["A","M"] and return "N" after seeing the underlying key "A", you may never
>> see keys from the ("D","M"] tablet. A good rule of thumb is to return keys
>> in the same row as the underlying keys that were used to generate them and
>> use a reversible transformation of columns within each row.

View raw message