Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 31 Dec 2014 04:10:14 +0000 (UTC)
From: "cuijianwei (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12764026.1419863132000.119737.1419999014727@Atlassian.JIRA>
In-Reply-To: <JIRA.12764026.1419863132000@Atlassian.JIRA>
References: <JIRA.12764026.1419863132000@Atlassian.JIRA>
 <JIRA.12764026.1419863132189@arcas>
Subject: [jira] [Commented] (HBASE-12770) Don't transfer all the queued
 hlogs of a dead server to the same alive server
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-12770?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D142=
61889#comment-14261889 ]=20

cuijianwei commented on HBASE-12770:
------------------------------------

Thanks for your concern. I agree that it will be helpful to balance replica=
tion loads among region servers if we know queue depths of peers. The coord=
inator could be the master, or on the other hand, I am not sure whether we =
can have a method to conduct stable balanced replication loads for peers wi=
thout the central coordinator. As an optional way, the RS could know the to=
tal queued hlog count if every RS report their queue depths, then each RS c=
ould compute the average replication load based on the current alive RS cou=
nt when transferring queues, this might help the RS to better decide whethe=
r to transfer a queue. I will work on the detail to have a try.

> Don't transfer all the queued hlogs of a dead server to the same alive se=
rver
> -------------------------------------------------------------------------=
----
>
>                 Key: HBASE-12770
>                 URL: https://issues.apache.org/jira/browse/HBASE-12770
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: cuijianwei
>            Priority: Minor
>
> When a region server is down(or the cluster restart), all the hlog queues=
 will be transferred by the same alive region server. In a shared cluster, =
we might create several peers replicating data to different peer clusters. =
There might be lots of hlogs queued for these peers caused by several reaso=
ns, such as some peers might be disabled, or errors from peer cluster might=
 prevent the replication, or the replication sources may fail to read some =
hlog because of hdfs problem. Then, if the server is down or restarted, ano=
ther alive server will take all the replication jobs of the dead server, th=
is might bring a big pressure to resources(network/disk read) of the alive =
server and also is not fast enough to replicate the queued hlogs. And if th=
e alive server is down, all the replication jobs including that takes from =
other dead servers will once again be totally transferred to another alive =
server, this might cause a server have a large number of queued hlogs(in ou=
r shared cluster, we find one server might have thousands of queued hlogs f=
or replication). As an optional way, is it reasonable that the alive server=
 only transfer one peer's hlogs from the dead server one time? Then, other =
alive region servers might have the opportunity to transfer the hlogs of re=
st peers. This may also help the queued hlogs be processed more fast. Any d=
iscussion is welcome.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)