hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From suresh srinivas <srini30...@gmail.com>
Subject Re: Rapid growth in Non DFS Used disk space
Date Sun, 15 May 2011 04:20:44 GMT
dfs.data.dir/current is used by datanodes to store blocks. This directory
should only have files starting with blk-*

Things to check:
- Are there other files that are not blk related?
- Did you manually copy the content of one storage dir to another? (some
folks did this when they added new disks)


On Fri, May 13, 2011 at 1:41 PM, Kester, Scott <SKester@weather.com> wrote:

> We have a job that cleans up the mapred.local directory, so that¹s not it.
>  I have done some further looking at data usage on the datanodes and 99%
> of the space used is under the dfs.data.dir/current directory.  What would
> be under 'current' that wasn't part of HDFS?
>
> On 5/13/11 3:12 PM, "Allen Wittenauer" <aw@apache.org> wrote:
>
> >
> >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:
> >>
> >>
> >>> 2) Any ideas on what is driving the growth in Non DFS Used space?   I
> >>> looked for things like growing log files on the datanodes but didn't
> >>>find
> >>> anything.
> >>>
> >>
> >> Logs are one possible culprit. Another is to look for old files that
> >>might
> >> be orphaned in your mapred.local.dir - there have been bugs in the past
> >> where we've leaked files. If you shut down the TaskTrackers, you can
> >>safely
> >> delete everything from within mapred.local.dirs.
> >
> >       Part of our S.O.P. during Hadoop bounces is to wipe mapred.local
> out.
> >The TT doesn't properly clean up after itself.
>
>


-- 
Regards,
Suresh

Mime
View raw message