hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@huawei.com>
Subject RE: DFS Used V/S Non DFS Used
Date Sat, 11 Oct 2014 05:44:26 GMT
Hi Manoj


Non DFS used is any data in the filesystem of the data node(s) that isn't in dfs.datanode.data.dirs.

This would include log files, mapreduce shuffle output and local copies of data files (if
you put them on a data node).

Use du or a similar tool to see whats taking up the space in your filesystem..


Non DFS used" is calculated by following formula:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used

It is still confusing, at least for me.

Because Configured Capacity = Total Disk Space - Reserved Space.

So Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 100 GB disk, and I set the reserved space (dfs.datanode.du.reserved)
to 30 GB.

In the disk, the system and other files used up to 40 GB, DFS Used 10 GB. If you run df -h
, you will see the available space is 50GB for that disk volume.

In HDFS web UI, it will show

Non DFS used = 100GB(Total) - 30 GB( Reserved) - 10 GB (DFS used) - 50GB(Remaining) = 10 GB

So it actually means, you initially configured to reserve 30G for non dfs usage, and 70 G
for HDFS. However, it turns out non dfs usage exceeds the 30G reservation and eat up 10 GB
space which should belongs to HDFS!

The term "Non DFS used" should really be renamed to something like "How much configured DFS
capacity are occupied by non dfs use"

And one should stop try to figure out why the non dfs use are so high inside hadoop.

One useful command is lsof | grep delete, which will help you identify those open file which
has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold
reference to those already deleted files. And these references will occupy disk space.

Also du -hsx * | sort -rh | head -10 helps list the top ten largest folders.





Thanks & Regards



Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
Mobile : +91   9620022006
Fax : +91-80-41118578

________________________________
From: Manoj Samel [manojsameltech@gmail.com]
Sent: Saturday, October 11, 2014 3:08 AM
To: user@hadoop.apache.org
Subject: Re: DFS Used V/S Non DFS Used

Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non DFS Used" reported
number is non-zero. What does this means ? What is being referred as "temp files" ? and how
can they encroach in the example of /disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <suresh@hortonworks.com<mailto:suresh@hortonworks.com>>
wrote:
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories - Reserved space
as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the
size by which temporary files exceed the reserved space and encroach into the DFS configured
space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <manojsameltech@gmail.com<mailto:manojsameltech@gmail.com>>
wrote:
Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And
each of these disk has a directory for data node and a directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode)
& SUM(/disk*/non-datanode) etc. resp. ?

Thanks,





--
http://hortonworks.com/download/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.


Mime
View raw message