hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Appy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
Date Mon, 27 Nov 2017 21:03:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267542#comment-16267542

Appy commented on HBASE-19290:


I still don't get it. Am starting to feel thick head on this one :). But i am willing to side
it in order to make progress here. Feel free to commit afa i am concerned. But let me try
to illustrate my question one last time.

bq. The other splitter is available (gated by the return value from calculateAvailableSplitters()),
hence another call to grabTask() should be allowed, right ?
Of course we should do another call. But the confusing is about throttling. Why do we throttle
(extra sleep) when grabbedTask=0 and not when grabbedTask =1. It's not obvious, at least to

bq. But when grabbedTask =1, and we still keep failing to grab tasks, it will end the for
loop and enter while (seq_start == taskReadySeq.get()) {} loop, do this have any problem?
"....it will end for loop and enter while...." - This happens even when grabbedTask = 0. Then
why do we need {{if (grabbedTask==0  && ...)}}??
I know you said earlier that taskReadySeq will keep increasing as zk nodes keep updating,
so that wait() on condition may not provide much of throttling when there's lot of churn.
But again, that applies to irrespective of value of grabbedTask.

Can you please walk me through a full case explaining need of that "if" condition and why
value of grabbedTask=0 is special.

> Reduce zk request when doing split log
> --------------------------------------
>                 Key: HBASE-19290
>                 URL: https://issues.apache.org/jira/browse/HBASE-19290
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, HBASE-19290.master.003.patch,
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort and doing
split log, the split is very very slow, and we find the regionserver and master wait on the
zookeeper response, so we need to reduce zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will get rsZNode's
children from zookeeper, when cluster is huge, this is heavy. This patch reduce the request.

> (2) When the regionserver has max split tasks running, it may still trying to grab task
and issue zookeeper request, we should sleep and wait until we can grab tasks again.  

This message was sent by Atlassian JIRA

View raw message