accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Povey <...@maana.io>
Subject Re: Is there a sensible way to do this? Sequential Batch Scanner
Date Wed, 28 Oct 2015 20:00:07 GMT
Unfortunately that’s pretty much what I’m doing now, and the results are large enough that
pulling them back and sorting them causes fairly dramatic GC issues.
If I could get them in sorted order I no longer need to retain them, I can just process them
and discard them eliminating my GC issues.
I think the way I’ll end up working around this in the short term is to pull pages of data
from a batch scanner, sort those, then combine the paged results. That should be manageable.

Rob Povey

From: Keith Turner <keith@deenlo.com<mailto:keith@deenlo.com>>
Reply-To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Date: Wednesday, October 28, 2015 at 8:04 AM
To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: Is there a sensible way to do this? Sequential Batch Scanner

Will the results always fit into memory?  If so could put results from batch scanner into
ArrayList and sort it.

On Tue, Oct 27, 2015 at 6:21 PM, Rob Povey <rob@maana.io<mailto:rob@maana.io>>
wrote:
What I want is something that behaves like a BatchScanner (I.e. Takes a collection of Ranges
in a single RPC), but preserves the scan ordering.
I understand this would greatly impact performance, but in my case I can manually partition
my request on the client, and send one request per tablet.
I can’t use scanners, because in some cases I have 10’s of thousands of none consecutive
ranges.
If I use a single threaded BatchScanner, and only request data from a single Tablet, am I
guaranteed ordering?
This appears to work correctly in my small tests (albeit slower than a single 1 thread Batch
scanner call), but I don’t really want to have to rely on it if the semantic isn’t guaranteed.
If not Is there another “efficient” way to do this.

Thanks

Rob Povey


Mime
View raw message