hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Application errors with one disk on datanode getting filled up to 100%
Date Mon, 10 Jun 2013 09:41:25 GMT
when you say application errors out .. does that mean your mapreduce job is
erroring? In that case apart from hdfs space you will need to look at
mapred tmp directory space as well.

you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a
replication factor of 3 so at max you will have datasize of 5TB with you.
I am also assuming you are not scheduling your program to run on entire 5TB
with just 10 nodes.

i suspect your clusters mapred tmp space is getting filled in while the job
is running.

On Mon, Jun 10, 2013 at 3:06 PM, Mayank <mail2mayank@gmail.com> wrote:

> We are running a hadoop cluster with 10 datanodes and a namenode. Each
> datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each
> disk having a capacity 414GB.
> hdfs-site.xml has following property set:
> <property>
>         <name>dfs.data.dir</name>
> <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs</value>
>         <description>Data dirs for DFS.</description>
> </property>
> Now we are facing a issue where in we find /data1 getting filled up
> quickly and many a times we see it's usage running at 100% with just few
> megabytes of free space. This issue is visible on 7 out of 10 datanodes at
> present.
> We've some java applications which are writing to hdfs and many a times we
> are seeing foloowing errors in our application logs:
> java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. Aborting...
> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3093)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790)
> I went through some old discussions and looks like manual rebalancing is
> what is required in this case and we should also have
> dfs.datanode.du.reserved set up.
> However I'd like to understand if this issue, with one disk getting filled
> up to 100% can result into the issue which we are seeing in our
> application.
> Also, are there any other peformance implications due to some of the disks
> running at 100% usage on a datanode.
> --
> Mayank Joshi
> Skype: mail2mayank
> Mb.:  +91 8690625808
> Blog: http://www.techynfreesouls.co.nr
> PhotoStream: http://picasaweb.google.com/mail2mayank
> Today is tommorrow I was so worried about yesterday ...

Nitin Pawar

View raw message