hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is there a way to turn off MAPREDUCE-2415?
Date Sun, 26 Aug 2012 18:21:59 GMT
Yes that is true, it does maintain N events in memory and then flushes
them down to disk upon closure. With a reasonable size (2 MB of logs
say) I don't see that causing any memory fill-up issues at all, since
it does cap (and discard at tail).

The other alternative may be to switch down the log level on the task,
via mapred.map.child.log.level and/or mapred.reduce.child.log.level
set to WARN or ERROR.

On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <koert@tresata.com> wrote:
> Looks like mapred.userlog.limit.kb is implemented by keeping some list in
> memory, and the logs are not writting to disk until the job finishes or is
> killed. That doesn't sound acceptable to me.
>
> Well i am not the only one with this problem. See MAPREDUCE-1100
>
>
> On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi Koert,
>>
>> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <koert@tresata.com> wrote:
>> > Hey Harsh,
>> > Thanks for responding!
>> > Would limiting the logging for each task via mapred.userlog.limit.kb be
>> > strictly enforced (while the job is running)? That would solve my issue
>> > of
>> > runaway logging on a job filling up the datanode disks. I would set the
>> > limit high since in general i do want to retain logs, just not in case a
>> > single rogue job starts producing many gigabytes of logs.
>> > Thanks!
>>
>> It is not strictly enforced such as counter limits are. Exceeding it
>> wouldn't fail the task, only cause the extra logged events to not
>> appear at all (thereby limiting the size).
>>
>> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >> Hi Koert,
>> >>
>> >> To answer on point, there is no turning off this feature.
>> >>
>> >> Since you don't seem to care much for logs from tasks persisting,
>> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower
>> >> value than 24 hours (such as 1h)? Or you may even limit the logging
>> >> from each task to a certain amount of KB via mapred.userlog.limit.kb,
>> >> which is unlimited by default.
>> >>
>> >> Would either of these work for you?
>> >>
>> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <koert@tresata.com>
>> >> wrote:
>> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to
>> >> > the
>> >> > same
>> >> > disk as where the OS is. So if that disks goes then i don't really
>> >> > care
>> >> > about tasktrackers failing. Also, the fact that logs were written to
>> >> > a
>> >> > single partition meant that i could make sure they would not grow too
>> >> > large
>> >> > in case someone had too verbose logging on a large job. With
>> >> > MAPREDUCE-2415
>> >> > a job that does massive amount of logging can fill up all the
>> >> > mapred.local.dir, which in our case are on the same partition as the
>> >> > hdfs
>> >> > data dirs, so now faulty logging can fill up hdfs storage, which i
>> >> > really
>> >> > don't like. Any ideas?
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message