Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 23 Nov 2017 03:05:00 +0000 (UTC)
From: "binlijin (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13119133.1510902206000.306477.1511406300920@Atlassian.JIRA>
In-Reply-To: <JIRA.13119133.1510902206000@Atlassian.JIRA>
References: <JIRA.13119133.1510902206000@Atlassian.JIRA> <JIRA.13119133.1510902206666@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HBASE-19290) Reduce zk request when doing split
 log
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 23 Nov 2017 03:05:07 -0000


    [ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263729#comment-16263729 ] 

binlijin commented on HBASE-19290:
----------------------------------

We observe that when the cluster is big, regionserver issue too much zookeeper request to get availableRSs from rsZNode and also getTaskList from splitLogZNode. Have more nodes, the get availableRSs is more heavy.

> Reduce zk request when doing split log
> --------------------------------------
>
>                 Key: HBASE-19290
>                 URL: https://issues.apache.org/jira/browse/HBASE-19290
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort and doing split log, the split is very very slow, and we find the regionserver and master wait on the zookeeper response, so we need to reduce zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will get rsZNode's children from zookeeper, when cluster is huge, this is heavy. This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to grab task and issue zookeeper request, we should sleep and wait until we can grab tasks again.  


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)