We have a job that cleans up the mapred.local directory, so that¹s not it.
I have done some further looking at data usage on the datanodes and 99%
of the space used is under the dfs.data.dir/current directory. What would
be under 'current' that wasn't part of HDFS?
On 5/13/11 3:12 PM, "Allen Wittenauer" <aw@apache.org> wrote:
>
>On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:
>>
>>
>>> 2) Any ideas on what is driving the growth in Non DFS Used space? I
>>> looked for things like growing log files on the datanodes but didn't
>>>find
>>> anything.
>>>
>>
>> Logs are one possible culprit. Another is to look for old files that
>>might
>> be orphaned in your mapred.local.dir - there have been bugs in the past
>> where we've leaked files. If you shut down the TaskTrackers, you can
>>safely
>> delete everything from within mapred.local.dirs.
>
> Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out.
>The TT doesn't properly clean up after itself.
|