Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 343EA200D51 for ; Thu, 23 Nov 2017 04:05:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3270D160BFD; Thu, 23 Nov 2017 03:05:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 78BA6160C0F for ; Thu, 23 Nov 2017 04:05:06 +0100 (CET) Received: (qmail 34820 invoked by uid 500); 23 Nov 2017 03:05:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 34458 invoked by uid 99); 23 Nov 2017 03:05:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Nov 2017 03:05:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 40C69180791 for ; Thu, 23 Nov 2017 03:05:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id s_0ninTU_-OX for ; Thu, 23 Nov 2017 03:05:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id CF85460F77 for ; Thu, 23 Nov 2017 03:05:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8E87DE259A for ; Thu, 23 Nov 2017 03:05:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E15FE241B8 for ; Thu, 23 Nov 2017 03:05:00 +0000 (UTC) Date: Thu, 23 Nov 2017 03:05:00 +0000 (UTC) From: "binlijin (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19290) Reduce zk request when doing split log MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 23 Nov 2017 03:05:07 -0000 [ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263729#comment-16263729 ] binlijin commented on HBASE-19290: ---------------------------------- We observe that when the cluster is big, regionserver issue too much zookeeper request to get availableRSs from rsZNode and also getTaskList from splitLogZNode. Have more nodes, the get availableRSs is more heavy. > Reduce zk request when doing split log > -------------------------------------- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement > Reporter: binlijin > Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort and doing split log, the split is very very slow, and we find the regionserver and master wait on the zookeeper response, so we need to reduce zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will get rsZNode's children from zookeeper, when cluster is huge, this is heavy. This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to grab task and issue zookeeper request, we should sleep and wait until we can grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)