hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Mundlapudi <bharathw...@yahoo.com>
Subject Re: Datanode won't start with bad disk
Date Thu, 24 Mar 2011 23:08:04 GMT
Also, you will need this patch.
https://issues.apache.org/jira/browse/HADOOP-7040




________________________________
From: Bharath Mundlapudi <bharathwork@yahoo.com>
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Sent: Thursday, March 24, 2011 4:00 PM
Subject: Re: Datanode won't start with bad disk


Hi Adam,

I have posted a patch for this problem for Hadoop version 20. Please refer the following Jira.

https://issues.apache.org/jira/browse/HDFS-1592

-Bharath



________________________________
From: Adam Phelps <amp@opendns.com>
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Sent: Thursday, March 24, 2011 10:30 AM
Subject: Re: Datanode won't start with bad disk

We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated
set to 2 and didn't see any problem while the DataNode process was running we are seeing a
problem when we needed to restart the DataNode process:

2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were
set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing...
2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation
not permitted

In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.  It gets that permission
error because we have the mount directory set to be immutable:

root@s3:/var/log/hadoop# lsattr  /var/lib/stats/hdfs/
------------------- /var/lib/stats/hdfs/2
----i------------e-
 /var/lib/stats/hdfs/4
------------------- /var/lib/stats/hdfs/3
------------------- /var/lib/stats/hdfs/1

As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted.

HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right
thing in this case.  Is this a known problem, or is there some other way we should be configuring
things to allow the DataNode to come up in this situation?

(clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the
correct solution)

Thanks
- Adam
Mime
View raw message