accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: Is there a sensible way to do this? Sequential Batch Scanner
Date Wed, 28 Oct 2015 20:28:43 GMT
Can you write the results back to a temporary accumulo table?
On Oct 28, 2015 4:00 PM, "Rob Povey" <rob@maana.io> wrote:

> Unfortunately that’s pretty much what I’m doing now, and the results are
> large enough that pulling them back and sorting them causes fairly dramatic
> GC issues.
> If I could get them in sorted order I no longer need to retain them, I can
> just process them and discard them eliminating my GC issues.
> I think the way I’ll end up working around this in the short term is to
> pull pages of data from a batch scanner, sort those, then combine the paged
> results. That should be manageable.
>
> Rob Povey
>
> From: Keith Turner <keith@deenlo.com>
> Reply-To: "user@accumulo.apache.org" <user@accumulo.apache.org>
> Date: Wednesday, October 28, 2015 at 8:04 AM
> To: "user@accumulo.apache.org" <user@accumulo.apache.org>
> Subject: Re: Is there a sensible way to do this? Sequential Batch Scanner
>
> Will the results always fit into memory?  If so could put results from
> batch scanner into ArrayList and sort it.
>
> On Tue, Oct 27, 2015 at 6:21 PM, Rob Povey <rob@maana.io> wrote:
>
>> What I want is something that behaves like a BatchScanner (I.e. Takes a
>> collection of Ranges in a single RPC), but preserves the scan ordering.
>> I understand this would greatly impact performance, but in my case I can
>> manually partition my request on the client, and send one request per
>> tablet.
>> I can’t use scanners, because in some cases I have 10’s of thousands of
>> none consecutive ranges.
>> If I use a single threaded BatchScanner, and only request data from a
>> single Tablet, am I guaranteed ordering?
>> This appears to work correctly in my small tests (albeit slower than a
>> single 1 thread Batch scanner call), but I don’t really want to have to
>> rely on it if the semantic isn’t guaranteed.
>> If not Is there another “efficient” way to do this.
>>
>> Thanks
>>
>> Rob Povey
>>
>>
>

Mime
View raw message