Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 7 Sep 2015 07:26:47 +0000 (UTC)
From: "Hudson (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12859390.1440612295000.270739.1441610807141@Atlassian.JIRA>
In-Reply-To: <JIRA.12859390.1440612295000@Atlassian.JIRA>
References: <JIRA.12859390.1440612295000@Atlassian.JIRA>
 <JIRA.12859390.1440612295960@arcas>
Subject: [jira] [Commented] (HDFS-8960) DFS client says "no more good
 datanodes being available to try" on a single drive failure
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HDFS-8960?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14733=
333#comment-14733333 ]=20

Hudson commented on HDFS-8960:
------------------------------

FAILURE: Integrated in HBase-1.3 #152 (See [https://builds.apache.org/job/H=
Base-1.3/152/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: r=
ev bbafb47f7271449d46b46569ca9f0cb227b44c6e)
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestL=
ogRolling.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErr=
orsExposed.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller=
.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogK=
ey.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Damag=
edWALException.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFaile=
dAppendAndSync.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMulti=
VersionConcurrencyControl.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMulti=
VersionConcurrencyControlBasic.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Proto=
bufLogWriter.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLo=
ckup.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLo=
g.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegi=
on.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWAL=
Entry.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVers=
ionConcurrencyControl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Proto=
bufLogReader.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncF=
uture.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.j=
ava


> DFS client says "no more good datanodes being available to try" on a sing=
le drive failure
> -------------------------------------------------------------------------=
----------------
>
>                 Key: HDFS-8960
>                 URL: https://issues.apache.org/jira/browse/HDFS-8960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.1
>         Environment: openjdk version "1.8.0_45-internal"
> OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
> OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
>            Reporter: Benoit Sigoure
>         Attachments: blk_1073817519_77099.log, r12s13-datanode.log, r12s1=
6-datanode.log
>
>
> Since we upgraded to 2.7.1 we regularly see single-drive failures cause w=
idespread problems at the HBase level (with the default 3x replication targ=
et).
> Here's an example.  This HBase RegionServer is r12s16 (172.24.32.16) and =
is writing its WAL to [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:=
10110] as can be seen by the following occasional messages:
> {code}
> 2015-08-23 06:28:40,272 INFO  [sync.3] wal.FSHLog: Slow sync cost: 123 ms=
, current pipeline: [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10=
110]
> {code}
> A bit later, the second node in the pipeline above is going to experience=
 an HDD failure.
> {code}
> 2015-08-23 07:21:58,720 WARN  [DataStreamer for file /hbase/WALs/r12s16.s=
jc.aristanetworks.com,9104,1439917659071/r12s16.sjc.aristanetworks.com%2C91=
04%2C1439917659071.default.1440314434998 block BP-1466258523-172.24.32.1-14=
37768622582:blk_1073817519_77099] hdfs.DFSClient: Error Recovery for block =
BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099 in pipeline 17=
2.24.32.16:10110, 172.24.32.13:10110, 172.24.32.8:10110: bad datanode 172.2=
4.32.8:10110
> {code}
> And then HBase will go like "omg I can't write to my WAL, let me commit s=
uicide".
> {code}
> 2015-08-23 07:22:26,060 FATAL [regionserver/r12s16.sjc.aristanetworks.com=
/172.24.32.16:9104.append-pool1-t1] wal.FSHLog: Could not append. Requestin=
g close of wal
> java.io.IOException: Failed to replace a bad datanode on the existing pip=
eline due to no more good datanodes being available to try. (Nodes: current=
=3D[172.24.32.16:10110, 172.24.32.13:10110], original=3D[172.24.32.16:10110=
, 172.24.32.13:10110]). The current failed datanode replacement policy is D=
EFAULT, and a client may configure this via 'dfs.client.block.write.replace=
-datanode-on-failure.policy' in its configuration.
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDat=
anode(DFSOutputStream.java:969)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanod=
e2ExistingPipeline(DFSOutputStream.java:1035)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipel=
ineForAppendOrRecovery(DFSOutputStream.java:1184)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDat=
anodeError(DFSOutputStream.java:933)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOut=
putStream.java:487)
> {code}
> Whereas this should be mostly a non-event as the DFS client should just d=
rop the bad replica from the write pipeline.
> This is a small cluster but has 16 DNs so the failed DN in the pipeline s=
hould be easily replaced.  I didn't set {{dfs.client.block.write.replace-da=
tanode-on-failure.policy}} (so it's still {{DEFAULT}}) and didn't set {{dfs=
.client.block.write.replace-datanode-on-failure.enable}} (so it's still {{t=
rue}}).
> I don't see anything noteworthy in the NN log around the time of the fail=
ure, it just seems like the DFS client gave up or threw an exception back t=
o HBase that it wasn't throwing before or something else, and that made thi=
s single drive failure lethal.
> We've occasionally be "unlucky" enough to have a single-drive failure cau=
se multiple RegionServers to commit suicide because they had their WALs on =
that drive.
> We upgraded from 2.7.0 about a month ago, and I'm not sure whether we wer=
e seeing this with 2.7 or not =E2=80=93 prior to that we were running in a =
quite different environment, but this is a fairly new deployment.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)