accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: strategies beyond intersecting iterators?
Date Mon, 02 Jul 2012 09:55:53 GMT
On Sun, Jul 1, 2012 at 11:57 PM, Sukant Hajra <qn2b6c2b9w@snkmail.com> wrote:
> Excerpts from Sukant Hajra's message of Thu Jun 28 15:49:11 -0500 2012:
>>
>> The Accumulo documentation alludes to the problem a little:
>>
>>     If the results are unordered this is quite effective as the first results
>>     to arrive are as good as any others to the user.
>>
>> In our case, order matters because we want the last results without pulling in
>> everything.
>
> Actually, I was just thinking about this a little.  I don't know if this is
> specified in the documentation, but is there /any/ reliable (deterministic)
> ordering for the values returned by intersecting iterators?

Unrelated to the intersecting iterator, when using the batch scanner
you can not expect results in order.   The batch scanner send querys
out to tablet servers in parallel.  As batches of key/values are
returned from the tablet server they are immediately made available to
the client.  Therefore the client will iterate over interleaved key
values from different tablets.  The batch scanner is usually used with
the intersecting iterator to parallelize  scans.  I think this is
documented in the batch scanner java docs.

If the regular scanner were used, then the client would see the key
values in the order returned by the iterator.  However, only one
tablet would be scanned at a time.

>
> If there is, would it be horribly ill-advised to rely on this ordering for
> application logic if we got clever with our schema?
>
> Also, if someone could reply with the exact algorithm for this ordering, it
> would help put less burden on us to reverse engineer and/or read the source
> code correctly.
>
> Thanks for your help,
> Sukant

Mime
View raw message