accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: Is there a sensible way to do this? Sequential Batch Scanner
Date Wed, 28 Oct 2015 20:28:43 GMT
Can you write the results back to a temporary accumulo table?
On Oct 28, 2015 4:00 PM, "Rob Povey" <> wrote:

> Unfortunately that’s pretty much what I’m doing now, and the results are
> large enough that pulling them back and sorting them causes fairly dramatic
> GC issues.
> If I could get them in sorted order I no longer need to retain them, I can
> just process them and discard them eliminating my GC issues.
> I think the way I’ll end up working around this in the short term is to
> pull pages of data from a batch scanner, sort those, then combine the paged
> results. That should be manageable.
> Rob Povey
> From: Keith Turner <>
> Reply-To: "" <>
> Date: Wednesday, October 28, 2015 at 8:04 AM
> To: "" <>
> Subject: Re: Is there a sensible way to do this? Sequential Batch Scanner
> Will the results always fit into memory?  If so could put results from
> batch scanner into ArrayList and sort it.
> On Tue, Oct 27, 2015 at 6:21 PM, Rob Povey <> wrote:
>> What I want is something that behaves like a BatchScanner (I.e. Takes a
>> collection of Ranges in a single RPC), but preserves the scan ordering.
>> I understand this would greatly impact performance, but in my case I can
>> manually partition my request on the client, and send one request per
>> tablet.
>> I can’t use scanners, because in some cases I have 10’s of thousands of
>> none consecutive ranges.
>> If I use a single threaded BatchScanner, and only request data from a
>> single Tablet, am I guaranteed ordering?
>> This appears to work correctly in my small tests (albeit slower than a
>> single 1 thread Batch scanner call), but I don’t really want to have to
>> rely on it if the semantic isn’t guaranteed.
>> If not Is there another “efficient” way to do this.
>> Thanks
>> Rob Povey

View raw message