accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <vi...@apache.org>
Subject Re: BatchScanner sort question
Date Fri, 25 Oct 2013 17:03:26 GMT
The batch scanner works by getting batches from all tablets in the scan.
This will typically result in getting sequential batches that are in
non-sequential ordering. Because batches are solely based on individual
key-value pairs, it is possible to get a batch that ends mid-row such that
the following key is a completely different key, also possibly mid-row. If
you want to guarantee entire rows, the whole row iterator can be used.

tldr; Option2 is accurate, but you can force Option1 to occur


On Fri, Oct 25, 2013 at 12:59 PM, Peter Rainer <rainer.peter@gmail.com>wrote:

> Hi,
>
> in the BatchScanner JavaDoc it says "Also only use this *when you do not
> care about the returned data being in sorted order*.* *If you want to
> lookup a few ranges and expect those ranges to contain a lot of data, then
> use the Scanner instead. Also, the Scanner will return data in sorted
> order, this will not."
>
> I'm not a 100% sure how to interpret this, so I was wondering if anyone of
> you could help me clarify that:
>
> *Option 1)*
> Rows are not sorted, but all Key/Value Pairs with the same Row Key are in
> sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> A:CF2:CQ2:2
> C:CF1:CQ1:1
> B:CF1:CQ1:1
>
> *Option2)*
> Rows are not sorted and not even Key/Value Pairs with the same Row Key are
> in sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> C:CF1:CQ1:1
> A:CF2:CQ2:2
> B:CF1:CQ1:1
>
>
> Thanks,
> Peter
>
>

Mime
View raw message