accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Scanning with many singleton ranges?
Date Thu, 02 Apr 2015 22:34:44 GMT
That seems perfectly reasonable to me, IMO. I'm surprised to hear the 
tserver crashed.

Taking a quick glance at the code, it looks like this would be a good 
place to do some optimization in the BatchScanner's impl 
(TabletServerBatchReaderImpl). The BatchScanner will bin the ranges to 
the tablets and the servers hosting those tablets. Normally, this would 
be spread out, but, in your single server case, all 1M rows would all go 
to a single TabletServer in one RPC call.

I'm guessing a good optimization here would be to check the size of a 
batch of Ranges for a single tabletserver, and when above a certain 
threshold, split the batch in half and try to reprocess each half (the 
recursion would naturally keep splitting until we get down to some 
high-watermark).

Point being, if your client VM constructed the Ranges without issue, the 
BatchScanner impl should be smart enough to not knock over a TabletServer.

What was the reason the tserver died? OOME? Was there anything at the 
end of the log files or in the .out/.err files?

- Josh

Dylan Hutchison wrote:
> A friend of mine has a use case where he wants to scan ~1M individual
> rows, scattered across a ~15GB table.  He performed the following:
>
> 1. Gather a List of Range objects, each one a singleton range spanning
> an entire row.
> 2. Create a BatchScanner with one read thread.
> 3. Set the ranges via BatchScanner.setRanges()
> 4. Start iterating through the scanner.
>
> Performing these steps crashed the TabletServer for my friend (haven't
> had time to verify it myself yet). We're using a single-node standalone
> 1.6.1 Accumulo instance.
>
> Is this a bad way to use Accumulo?  I advised my friend to batch the
> reads into groups of ~10k ranges and see if that helps.  I wanted to
> check with the community and see if we're doing something weird.  If the
> behavior should have worked, I can try to put together a test case
> reproducing it, that creates a table with many entries and then scans
> with many ranges.
>
> Thanks,
> Dylan Hutchison
>

Mime
View raw message