accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Iterators that alter key-values
Date Fri, 15 May 2015 18:32:09 GMT
>
> is it the same instance of the iterator object


No, it is not.

On Fri, May 15, 2015 at 2:16 PM, Dave Hardcastle <hardcastle.dave@gmail.com>
wrote:

> Jim,
>
> That explains a lot - I knew that the iterator stack could be resumed in
> the middle of a range, but didn't realise that it used the last emitted key
> to decide where to resume.
>
> Just so I'm clear, when iterators get stopped and later resumed, is it the
> same instance of the iterator object that's restarted (so that I could
> store state in there and use that to help the reseek) or is it a new
> instance of the iterator that has to be able to resume purely on the basis
> of the last emitted key?
>
> As you say though, it's probably best to stick to modifying values only.
>
> Thanks very much,
>
> Dave.
>
> On 15 May 2015 at 18:55, James Hughes <jnh5y@virginia.edu> wrote:
>
>> Hi Dave,
>>
>> The big thing to note is that your iterator stack may get stopped and
>> torn down for various reasons.  As Accumulo recreates the stack, it will
>> call 'seek' with the last emitted key in order to resume.
>>
>> If you are returning keys out of order in an iterator, the 'seek' method
>> needs to be able to undo the transformation and call 'seek' appropriately.
>> That's not impossible, but it isn't trivial.
>>
>> In GeoMesa, we did something like that at one point (without having a
>> smart 'seek').  I enjoyed two days of debugging trying to figure out why
>> medium sized requests would hang.  (There was an infinite loop....)  From
>> that experience, I'd suggest only modifying values.
>>
>> Cheers,
>>
>> Jim
>>
>>
>> On Fri, May 15, 2015 at 1:26 PM, Dave Hardcastle <
>> hardcastle.dave@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've always assumed that the last iterator in the stack can make
>>> arbitrary changes to keys and values, including not returning the keys in
>>> sorted order. I know that SortedKeyValueIterator says that "anything
>>> implementing this interface should return keys in sorted order" - but I
>>> don't see a good reason that has to be true for the final iterator. This
>>> assumption seems to be backed up by the manual which says that "the only
>>> safe way to generate additional data in an iterator is to alter the current
>>> key-value pair" - it doesn't say that making arbitrary modifications to the
>>> rowkey or key is forbidden.
>>>
>>> I have a situation where I am making a transformation of the rowkey that
>>> may not preserve the ordering of the keys. When I scan for individual
>>> ranges I get the correct results. When I scan for two ranges using a
>>> BatchScanner, I get lots of data back which is not in the ranges I queried
>>> for. I am not explicitly checking that I have not gone beyond the range,
>>> but that should not be necessary as I am not doing any seeking, only
>>> consuming the key-values I receive.
>>>
>>> So, my main question is whether the last iterator is allowed to not
>>> return keys in sorted order?
>>>
>>> Thanks,
>>>
>>> Dave.
>>>
>>
>>
>

Mime
View raw message