Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates
 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANx3uAg_u_7V9_rq09ix0UEeDsAeeTY0vez0K38JH7ErODP_7g@mail.gmail.com>
References: 
 <CANx3uAjYFzx=6OtUXtbvhbe4x96DQTr3OegM+SC8ibzuiN_4OQ@mail.gmail.com>
 <CAOcnVr2cRJZN4EFaaDr9jGPWMX4ZmMMMobpDYYZUnAk2hq2zYw@mail.gmail.com>
 <CANx3uAj+bG4-S1quoO-RTneZXqz0qJaggP=70ipe3Z7svLJaWg@mail.gmail.com>
 <CAOcnVr2rTYxFN8Z5s7Ff-9TAoLt0eghCnNmykt2CAFfpRQA_+Q@mail.gmail.com>
 <CANx3uAg_u_7V9_rq09ix0UEeDsAeeTY0vez0K38JH7ErODP_7g@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Sun, 26 Aug 2012 23:51:59 +0530
Message-ID: 
 <CAOcnVr26qoh9vQGT_DEAiypa=7UmGPXZBNQdKWcoF2uksRuk3w@mail.gmail.com>
Subject: Re: Is there a way to turn off MAPREDUCE-2415?
To: user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Yes that is true, it does maintain N events in memory and then flushes
them down to disk upon closure. With a reasonable size (2 MB of logs
say) I don't see that causing any memory fill-up issues at all, since
it does cap (and discard at tail).

The other alternative may be to switch down the log level on the task,
via mapred.map.child.log.level and/or mapred.reduce.child.log.level
set to WARN or ERROR.

On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <koert@tresata.com> wrote:
> Looks like mapred.userlog.limit.kb is implemented by keeping some list in
> memory, and the logs are not writting to disk until the job finishes or is
> killed. That doesn't sound acceptable to me.
>
> Well i am not the only one with this problem. See MAPREDUCE-1100
>
>
> On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi Koert,
>>
>> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <koert@tresata.com> wrote:
>> > Hey Harsh,
>> > Thanks for responding!
>> > Would limiting the logging for each task via mapred.userlog.limit.kb be
>> > strictly enforced (while the job is running)? That would solve my issue
>> > of
>> > runaway logging on a job filling up the datanode disks. I would set the
>> > limit high since in general i do want to retain logs, just not in case a
>> > single rogue job starts producing many gigabytes of logs.
>> > Thanks!
>>
>> It is not strictly enforced such as counter limits are. Exceeding it
>> wouldn't fail the task, only cause the extra logged events to not
>> appear at all (thereby limiting the size).
>>
>> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >> Hi Koert,
>> >>
>> >> To answer on point, there is no turning off this feature.
>> >>
>> >> Since you don't seem to care much for logs from tasks persisting,
>> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower
>> >> value than 24 hours (such as 1h)? Or you may even limit the logging
>> >> from each task to a certain amount of KB via mapred.userlog.limit.kb,
>> >> which is unlimited by default.
>> >>
>> >> Would either of these work for you?
>> >>
>> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <koert@tresata.com>
>> >> wrote:
>> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to
>> >> > the
>> >> > same
>> >> > disk as where the OS is. So if that disks goes then i don't really
>> >> > care
>> >> > about tasktrackers failing. Also, the fact that logs were written to
>> >> > a
>> >> > single partition meant that i could make sure they would not grow too
>> >> > large
>> >> > in case someone had too verbose logging on a large job. With
>> >> > MAPREDUCE-2415
>> >> > a job that does massive amount of logging can fill up all the
>> >> > mapred.local.dir, which in our case are on the same partition as the
>> >> > hdfs
>> >> > data dirs, so now faulty logging can fill up hdfs storage, which i
>> >> > really
>> >> > don't like. Any ideas?
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>


-- 
Harsh J