hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Auto clean DistCache?
Date Wed, 27 Mar 2013 13:37:07 GMT
Oh! good to know! It keep tracks even of month old entries??? There is no TTL?

I was not able to find the documentation for  local.cache.size or
mapreduce.tasktracker.cache.local.size  in 1.0.x branch. Do you know
where I can found that?

Thanks,

JM

2013/3/27 Koji Noguchi <knoguchi@yahoo-inc.com>:
>> Else, I will go for a customed script to delete all directories (and content) older
than 2 or 3 days…
>>
> TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entries in memory.
> So if external process (like your script) start deleting dist cache files, there would
be inconsistency and you'll start seeing task initialization failures due to no file found
error.
>
> Koji
>
>
> On Mar 26, 2013, at 9:00 PM, Jean-Marc Spaggiari wrote:
>
>> For the situation I faced I was really a disk space issue, not related
>> to the number of files. It was writing on a small partition.
>>
>> I will try with local.cache.size or
>> mapreduce.tasktracker.cache.local.size to see if I can keep the final
>> total size under 5GB... Else, I will go for a customed script to
>> delete all directories (and content) older than 2 or 3 days...
>>
>> Thanks,
>>
>> JM
>>
>> 2013/3/26 Abdelrahman Shettia <ashettia@hortonworks.com>:
>>> Let me clarify , If there are lots of files or directories up to 32K (
>>> Depending on the user's # of files sys os config) in those distributed cache
>>> dirs, The OS will not be able to create any more files/dirs, Thus M-R jobs
>>> wont get initiated on those tasktracker machines. Hope this helps.
>>>
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli
>>> <vinodkv@hortonworks.com> wrote:
>>>>
>>>>
>>>> All the files are not opened at the same time ever, so you shouldn't see
>>>> any "# of open files exceeds error".
>>>>
>>>> Thanks,
>>>> +Vinod Kumar Vavilapalli
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote:
>>>>
>>>> Hi JM ,
>>>>
>>>> Actually these dirs need to be purged by a script that keeps the last 2
>>>> days worth of files, Otherwise you may run into # of open files exceeds
>>>> error.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Each time my MR job is run, a directory is created on the TaskTracker
>>>>
>>>> under mapred/local/taskTracker/hadoop/distcache (based on my
>>>>
>>>> configuration).
>>>>
>>>>
>>>> I looked at the directory today, and it's hosting thousands of
>>>>
>>>> directories and more than 8GB of data there.
>>>>
>>>>
>>>> Is there a way to automatically delete this directory when the job is
>>>> done?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> JM
>>>>
>>>>
>>>>
>>>
>

Mime
View raw message