hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdelrahman Shettia <ashet...@hortonworks.com>
Subject Re: Auto clean DistCache?
Date Tue, 26 Mar 2013 23:12:56 GMT
Let me clarify , If there are lots of files or directories up to 32K (
Depending on the user's # of files sys os config) in
those distributed cache dirs, The OS will not be able to create any more
files/dirs, Thus M-R jobs wont get initiated on those tasktracker machines.
Hope this helps.


Thanks


On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> All the files are not opened at the same time ever, so you shouldn't see
> any "# of open files exceeds error".
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote:
>
> Hi JM ,
>
> Actually these dirs need to be purged by a script that keeps the last 2
> days worth of files, Otherwise you may run into # of open files exceeds
> error.
>
> Thanks
>
>
> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> wrote:
>
> Hi,
>
>
> Each time my MR job is run, a directory is created on the TaskTracker
>
> under mapred/local/taskTracker/hadoop/distcache (based on my
>
> configuration).
>
>
> I looked at the directory today, and it's hosting thousands of
>
> directories and more than 8GB of data there.
>
>
> Is there a way to automatically delete this directory when the job is done?
>
>
> Thanks,
>
>
> JM
>
>
>
>

Mime
View raw message