accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Updated] (ACCUMULO-3710) Scanning with many singleton ranges crashes tserver
Date Thu, 14 May 2015 18:25:01 GMT


Josh Elser updated ACCUMULO-3710:
    Fix Version/s: 1.8.0

> Scanning with many singleton ranges crashes tserver
> ---------------------------------------------------
>                 Key: ACCUMULO-3710
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.6.1
>            Reporter: Dylan Hutchison
>             Fix For: 1.8.0
> Setup: single-node standalone 1.6.1 Accumulo instance.
> Use case: scan ~1M individual rows, scattered across a ~15GB table.  
> The following steps crash the TabletServer:
> 1. Gather a List of Range objects, each one a singleton range spanning an entire row.
> 2. Create a BatchScanner with one read thread.
> 3. Set the ranges via BatchScanner.setRanges()
> 4. Start iterating through the scanner.
> One solution is to batch the reads into groups of ~10k ranges idea.  
> Comment from Josh Elser:
> {quote}
> Taking a quick glance at the code, it looks like this would be a good place to do some
optimization in the BatchScanner's impl (TabletServerBatchReaderImpl). The BatchScanner will
bin the ranges to the tablets and the servers hosting those tablets. Normally, this would
be spread out, but, in your single server case, all 1M rows would all go to a single TabletServer
in one RPC call.
> I'm guessing a good optimization here would be to check the size of a batch of Ranges
for a single tabletserver, and when above a certain threshold, split the batch in half and
try to reprocess each half (the recursion would naturally keep splitting until we get down
to some high-watermark).
> Point being, if your client VM constructed the Ranges without issue, the BatchScanner
impl should be smart enough to not knock over a TabletServer.
> {quote}
> Verified to cause an OOME via  tserver_localhost.out:
> {quote}
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 12833"...
> {quote}

This message was sent by Atlassian JIRA

View raw message