hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Phelps <...@opendns.com>
Subject Re: Datanode won't start with bad disk
Date Thu, 24 Mar 2011 17:47:46 GMT
For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

- Adam

On 3/24/11 10:30 AM, Adam Phelps wrote:
> We have a bad disk on one of our datanode machines, and while we have
> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
> problem while the DataNode process was running we are seeing a problem
> when we needed to restart the DataNode process:
>
> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
> rwxr-xr-x, while actual: ---------. Fixing...
> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2011-03-24 16:50:20,091 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
> permitted
>
> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
> It gets that permission error because we have the mount directory set to
> be immutable:
>
> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
> ------------------- /var/lib/stats/hdfs/2
> ----i------------e- /var/lib/stats/hdfs/4
> ------------------- /var/lib/stats/hdfs/3
> ------------------- /var/lib/stats/hdfs/1
>
> As we'd previously seen HDFS just write to the local disk when a disk
> couldn't be mounted.
>
> HDFS is supposed to be able to handle failed disk, but it doesn't seem
> to be doing the right thing in this case. Is this a known problem, or is
> there some other way we should be configuring things to allow the
> DataNode to come up in this situation?
>
> (clearly we can remove the mount point from hdfs-site.xml, but that
> doesn't feel like the correct solution)
>
> Thanks
> - Adam
>


Mime
View raw message