hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16132) Scan does not return all the result when regionserver is busy
Date Thu, 30 Jun 2016 03:58:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356395#comment-15356395
] 

Yu Li commented on HBASE-16132:
-------------------------------

Let me add some background here:

This is a problem we ran into on our production cluster with patched 1.1.2 version. Below
is the way to stably reproduce the problem:
1. Special settings for the test:
{noformat}
regionserver.handler.count => 1
hbase.ipc.server.max.callqueue.length => 1
hbase.client.scanner.timeout.period => 3000
{noformat}
2. Load enough data using YCSB into tableA
3. Simulate a heavy load which keeps occupying the call queue and makes the RS busy: 4 physical
clients, each with 32 YCSB processes, each process with 100 threads, random read against tableA
4. Meanwhile, issue a scan request against tableA using the attached class (will attach the
file later)

I'm not sure but I think HBASE-16074 might be caused by the same problem, JFYI [~mantonov]
[~eclark]

> Scan does not return all the result when regionserver is busy
> -------------------------------------------------------------
>
>                 Key: HBASE-16132
>                 URL: https://issues.apache.org/jira/browse/HBASE-16132
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>         Attachments: HBASE-16132.patch, HBASE-16132_v2.patch, HBASE-16132_v3.patch, HBASE-16132_v3.patch
>
>
> We have find some corner case, when regionserver is busy and last a long time. Some scanner
may return null even if they do not scan all data.
> We find in ScannerCallableWithReplicas there is a case do not handler correct, when cs.poll
timeout and do not return any result , it is will return a null result, so scan get null result,
and end the scan. 
>  {code}
>     try {
>       Future<Pair<Result[], ScannerCallable>> f = cs.poll(timeout, TimeUnit.MILLISECONDS);
>       if (f != null) {
>         Pair<Result[], ScannerCallable> r = f.get(timeout, TimeUnit.MILLISECONDS);
>         if (r != null && r.getSecond() != null) {
>           updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done, pool);
>         }
>         return r == null ? null : r.getFirst(); // great we got an answer
>       }
>     } catch (ExecutionException e) {
>       RpcRetryingCallerWithReadReplicas.throwEnrichedException(e, retries);
>     } catch (CancellationException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } catch (InterruptedException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } catch (TimeoutException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } finally {
>       // We get there because we were interrupted or because one or more of the
>       // calls succeeded or failed. In all case, we stop all our tasks.
>       cs.cancelAll();
>     }
>     return null; // unreachable
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message