From issues-return-387816-archive-asf-public=cust-asf.ponee.io@hbase.apache.org  Mon Apr 29 22:48:02 2019
Return-Path: <issues-return-387816-archive-asf-public=cust-asf.ponee.io@hbase.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 14B7D18061A
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 30 Apr 2019 00:48:01 +0200 (CEST)
Received: (qmail 54180 invoked by uid 500); 29 Apr 2019 22:48:01 -0000
Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@hbase.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@hbase.apache.org>
List-Post: <mailto:issues@hbase.apache.org>
List-Id: <issues.hbase.apache.org>
Delivered-To: mailing list issues@hbase.apache.org
Received: (qmail 54167 invoked by uid 99); 29 Apr 2019 22:48:01 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2019 22:48:01 +0000
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 737DBE0413
	for <issues@hbase.apache.org>; Mon, 29 Apr 2019 22:48:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2F6CE24597
	for <issues@hbase.apache.org>; Mon, 29 Apr 2019 22:48:00 +0000 (UTC)
Date: Mon, 29 Apr 2019 22:48:00 +0000 (UTC)
From: "Andrew Purtell (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13229731.1556063029000.170415.1556578080191@Atlassian.JIRA>
In-Reply-To: <JIRA.13229731.1556063029000@Atlassian.JIRA>
References: <JIRA.13229731.1556063029000@Atlassian.JIRA> <JIRA.13229731.1556063029301@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HBASE-22301) Consider rolling the WAL if the
 HDFS write pipeline is slow
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/HBASE-22301?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D168=
29796#comment-16829796 ]=20

Andrew Purtell commented on HBASE-22301:
----------------------------------------

I sent an email to dev@hbase titled "Trunk only commits are a waste of ever=
yone's time". I am making this claim on that thread (smile). Let's take any=
 response to that to the email thread.=20

Back soon with a patch for branch-1 that takes the union of this approach a=
nd HBASE-21806

> Consider rolling the WAL if the HDFS write pipeline is slow
> -----------------------------------------------------------
>
>                 Key: HBASE-22301
>                 URL: https://issues.apache.org/jira/browse/HBASE-22301
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 3.0.0, 1.5.0, 2.3.0
>
>         Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.pat=
ch, HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, HBASE-22301-bra=
nch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffer=
ing a gray failure not an outright outage.=C2=A0HDFS operations, notably sy=
ncs, are abnormally slow on pipelines which include this subset of hosts. I=
f the regionserver's WAL is backed by an impacted pipeline, all WAL handler=
s can be consumed waiting for acks from the datanodes in the pipeline (reca=
ll that some of them are sick). Imagine a write heavy application distribut=
ing load uniformly over the cluster at a fairly high rate. With the WAL sub=
system slowed by HDFS level issues, all handlers can be blocked waiting to =
append to the WAL. Once all handlers are blocked, the application will expe=
rience backpressure. All (HBase) clients eventually have too many outstandi=
ng writes and block.
> Because the application is distributing writes near uniformly in the keys=
pace, the probability any given service endpoint will dispatch a request to=
 an impacted regionserver, even a single regionserver, approaches 1.0. So t=
he probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Altho=
ugh there is HDFS level monitoring, mechanisms, and procedures for this, we=
 should also attempt to take mitigating action at the HBase layer as soon a=
s we find ourselves in trouble. It would be enough to remove the affected d=
atanodes from the writer pipelines. A super simple strategy that can be eff=
ective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but=
 still can be susceptible. branch-2 sync WAL is susceptible.=C2=A0
> We already roll the WAL writer if the pipeline suffers the failure of a d=
atanode and the replication factor on the pipeline is too low.=C2=A0We shou=
ld also consider how much time it took for the write pipeline to complete a=
 sync the last time we measured it, or the max over the interval from now t=
o the last time we checked. If the sync time exceeds a configured threshold=
, roll the log writer then too. Fortunately we don't need to know which dat=
anode is making the WAL write pipeline slow, only that syncs on the pipelin=
e are too slow and exceeding a threshold. This is enough information to kno=
w when to roll it. Once we roll it, we will get three new randomly selected=
 datanodes. On most clusters the probability the new pipeline includes the =
slow datanode will be low. (And if for some reason it does end up with a pr=
oblematic datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitiga=
tion.
> Provide a metric for tracking when log roll is requested (and for what re=
ason).
> Emit a log line at log roll time that includes datanode pipeline details =
for further debugging and analysis, similar to the existing slow FSHLog syn=
c log line.
> If we roll too many times within a short interval of time this probably m=
eans there is a widespread problem with the fleet and so our mitigation is =
not helping and may be exacerbating those problems or operator difficulties=
. Ensure log roll requests triggered by this new feature happen infrequentl=
y enough to not cause difficulties under either normal or abnormal conditio=
ns.=C2=A0A=C2=A0very simple strategy that=C2=A0could work well under both n=
ormal and abnormal conditions=C2=A0is to define a fairly lengthy interval, =
default 5 minutes,=C2=A0and then insure we do not roll more than once durin=
g this interval for this reason.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)