hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is there a way to turn off MAPREDUCE-2415?
Date Sun, 26 Aug 2012 17:58:49 GMT
Hi Koert,

On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <koert@tresata.com> wrote:
> Hey Harsh,
> Thanks for responding!
> Would limiting the logging for each task via mapred.userlog.limit.kb be
> strictly enforced (while the job is running)? That would solve my issue of
> runaway logging on a job filling up the datanode disks. I would set the
> limit high since in general i do want to retain logs, just not in case a
> single rogue job starts producing many gigabytes of logs.
> Thanks!

It is not strictly enforced such as counter limits are. Exceeding it
wouldn't fail the task, only cause the extra logged events to not
appear at all (thereby limiting the size).

> On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi Koert,
>>
>> To answer on point, there is no turning off this feature.
>>
>> Since you don't seem to care much for logs from tasks persisting,
>> perhaps consider lowering the mapred.userlog.retain.hours to a lower
>> value than 24 hours (such as 1h)? Or you may even limit the logging
>> from each task to a certain amount of KB via mapred.userlog.limit.kb,
>> which is unlimited by default.
>>
>> Would either of these work for you?
>>
>> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <koert@tresata.com> wrote:
>> > We have smaller nodes (4 to 6 disks), and we used to write logs to the
>> > same
>> > disk as where the OS is. So if that disks goes then i don't really care
>> > about tasktrackers failing. Also, the fact that logs were written to a
>> > single partition meant that i could make sure they would not grow too
>> > large
>> > in case someone had too verbose logging on a large job. With
>> > MAPREDUCE-2415
>> > a job that does massive amount of logging can fill up all the
>> > mapred.local.dir, which in our case are on the same partition as the
>> > hdfs
>> > data dirs, so now faulty logging can fill up hdfs storage, which i
>> > really
>> > don't like. Any ideas?
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message