Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66B3E10BAA for ; Wed, 31 Dec 2014 04:10:19 +0000 (UTC) Received: (qmail 11108 invoked by uid 500); 31 Dec 2014 04:10:14 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 11070 invoked by uid 500); 31 Dec 2014 04:10:14 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 10964 invoked by uid 99); 31 Dec 2014 04:10:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Dec 2014 04:10:14 +0000 Date: Wed, 31 Dec 2014 04:10:14 +0000 (UTC) From: "cuijianwei (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12770?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D142= 61889#comment-14261889 ]=20 cuijianwei commented on HBASE-12770: ------------------------------------ Thanks for your concern. I agree that it will be helpful to balance replica= tion loads among region servers if we know queue depths of peers. The coord= inator could be the master, or on the other hand, I am not sure whether we = can have a method to conduct stable balanced replication loads for peers wi= thout the central coordinator. As an optional way, the RS could know the to= tal queued hlog count if every RS report their queue depths, then each RS c= ould compute the average replication load based on the current alive RS cou= nt when transferring queues, this might help the RS to better decide whethe= r to transfer a queue. I will work on the detail to have a try. > Don't transfer all the queued hlogs of a dead server to the same alive se= rver > -------------------------------------------------------------------------= ---- > > Key: HBASE-12770 > URL: https://issues.apache.org/jira/browse/HBASE-12770 > Project: HBase > Issue Type: Improvement > Components: Replication > Reporter: cuijianwei > Priority: Minor > > When a region server is down(or the cluster restart), all the hlog queues= will be transferred by the same alive region server. In a shared cluster, = we might create several peers replicating data to different peer clusters. = There might be lots of hlogs queued for these peers caused by several reaso= ns, such as some peers might be disabled, or errors from peer cluster might= prevent the replication, or the replication sources may fail to read some = hlog because of hdfs problem. Then, if the server is down or restarted, ano= ther alive server will take all the replication jobs of the dead server, th= is might bring a big pressure to resources(network/disk read) of the alive = server and also is not fast enough to replicate the queued hlogs. And if th= e alive server is down, all the replication jobs including that takes from = other dead servers will once again be totally transferred to another alive = server, this might cause a server have a large number of queued hlogs(in ou= r shared cluster, we find one server might have thousands of queued hlogs f= or replication). As an optional way, is it reasonable that the alive server= only transfer one peer's hlogs from the dead server one time? Then, other = alive region servers might have the opportunity to transfer the hlogs of re= st peers. This may also help the queued hlogs be processed more fast. Any d= iscussion is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)