Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 4 Aug 2016 11:34:20 +0000 (UTC)
From: "Phil Yang (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12764026.1419863132000.222973.1470310460565@Atlassian.JIRA>
In-Reply-To: <JIRA.12764026.1419863132000@Atlassian.JIRA>
References: <JIRA.12764026.1419863132000@Atlassian.JIRA> <JIRA.12764026.1419863132189@arcas>
Subject: [jira] [Updated] (HBASE-12770) Don't transfer all the queued hlogs
 of a dead server to the same alive server
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 04 Aug 2016 11:34:22 -0000


     [ https://issues.apache.org/jira/browse/HBASE-12770?page=3Dcom.atlassi=
an.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Yang updated HBASE-12770:
------------------------------
    Attachment: HBASE-12770-branch-1-v1.patch

Upload patch for branch-1

> Don't transfer all the queued hlogs of a dead server to the same alive se=
rver
> -------------------------------------------------------------------------=
----
>
>                 Key: HBASE-12770
>                 URL: https://issues.apache.org/jira/browse/HBASE-12770
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Jianwei Cui
>            Assignee: Phil Yang
>            Priority: Minor
>         Attachments: HBASE-12770-branch-1-v1.patch, HBASE-12770-trunk.pat=
ch, HBASE-12770-v1.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues=
 will be transferred by the same alive region server. In a shared cluster, =
we might create several peers replicating data to different peer clusters. =
There might be lots of hlogs queued for these peers caused by several reaso=
ns, such as some peers might be disabled, or errors from peer cluster might=
 prevent the replication, or the replication sources may fail to read some =
hlog because of hdfs problem. Then, if the server is down or restarted, ano=
ther alive server will take all the replication jobs of the dead server, th=
is might bring a big pressure to resources(network/disk read) of the alive =
server and also is not fast enough to replicate the queued hlogs. And if th=
e alive server is down, all the replication jobs including that takes from =
other dead servers will once again be totally transferred to another alive =
server, this might cause a server have a large number of queued hlogs(in ou=
r shared cluster, we find one server might have thousands of queued hlogs f=
or replication). As an optional way, is it reasonable that the alive server=
 only transfer one peer's hlogs from the dead server one time? Then, other =
alive region servers might have the opportunity to transfer the hlogs of re=
st peers. This may also help the queued hlogs be processed more fast. Any d=
iscussion is welcome.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)