hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Is there a way to turn off MAPREDUCE-2415?
Date Sun, 26 Aug 2012 17:50:19 GMT
Hey Harsh,
Thanks for responding!
Would limiting the logging for each task via mapred.userlog.limit.kb be
strictly enforced (while the job is running)? That would solve my issue of
runaway logging on a job filling up the datanode disks. I would set the
limit high since in general i do want to retain logs, just not in case a
single rogue job starts producing many gigabytes of logs.
Thanks!

On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi Koert,
>
> To answer on point, there is no turning off this feature.
>
> Since you don't seem to care much for logs from tasks persisting,
> perhaps consider lowering the mapred.userlog.retain.hours to a lower
> value than 24 hours (such as 1h)? Or you may even limit the logging
> from each task to a certain amount of KB via mapred.userlog.limit.kb,
> which is unlimited by default.
>
> Would either of these work for you?
>
> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <koert@tresata.com> wrote:
> > We have smaller nodes (4 to 6 disks), and we used to write logs to the
> same
> > disk as where the OS is. So if that disks goes then i don't really care
> > about tasktrackers failing. Also, the fact that logs were written to a
> > single partition meant that i could make sure they would not grow too
> large
> > in case someone had too verbose logging on a large job. With
> MAPREDUCE-2415
> > a job that does massive amount of logging can fill up all the
> > mapred.local.dir, which in our case are on the same partition as the hdfs
> > data dirs, so now faulty logging can fill up hdfs storage, which i really
> > don't like. Any ideas?
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message