Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr26qoh9vQGT_DEAiypa=7UmGPXZBNQdKWcoF2uksRuk3w@mail.gmail.com>
References: 
 <CANx3uAjYFzx=6OtUXtbvhbe4x96DQTr3OegM+SC8ibzuiN_4OQ@mail.gmail.com>
	<CAOcnVr2cRJZN4EFaaDr9jGPWMX4ZmMMMobpDYYZUnAk2hq2zYw@mail.gmail.com>
	<CANx3uAj+bG4-S1quoO-RTneZXqz0qJaggP=70ipe3Z7svLJaWg@mail.gmail.com>
	<CAOcnVr2rTYxFN8Z5s7Ff-9TAoLt0eghCnNmykt2CAFfpRQA_+Q@mail.gmail.com>
	<CANx3uAg_u_7V9_rq09ix0UEeDsAeeTY0vez0K38JH7ErODP_7g@mail.gmail.com>
	<CAOcnVr26qoh9vQGT_DEAiypa=7UmGPXZBNQdKWcoF2uksRuk3w@mail.gmail.com>
Date: Sun, 26 Aug 2012 14:39:38 -0400
Message-ID: 
 <CANx3uAh3k0AS9t9Hxw3Ytt+Go7UgRg3yHYK_Em3NRfX7+AGBGw@mail.gmail.com>
Subject: Re: Is there a way to turn off MAPREDUCE-2415?
From: Koert Kuipers <koert@tresata.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=90e6ba6e8d680d3a7004c82f88cf

--90e6ba6e8d680d3a7004c82f88cf
Content-Type: text/plain; charset=ISO-8859-1

Harsh,

I see the problem as follows: Usually we want to have people log what they
want, as long as they don't threaten the stability of the system.

However every once in a while somebody will submit a job that is overly
verbose and will generate many gigabytes of logs in minutes. This is
typically a honest mistake, and the person doesn't realize what is going on
(why is my job so slow?). Limiting the general logging levels for everyone
to deal with these mistakes seems ineffective. Telling the person to change
the logging level for his job will not work either since he/she doesn't
realize what is going on and certainly didn't know in advance.

So all i really want is a very high and hard limit on the log size per job,
to protect the system. Say many hundreds of megabytes or even gigabytes.
But when this limit is reached i want to logging to stop from that point
on, or even the job to be killed. mapred.userlog.limit.kb seems the wrong
tool for the job.

Before the logging got moved to the mapred.local.dir i had a limit simply
by limiting the size of the partition that logging went to.

Anyhow, looks like i will have to wait for MAPRED-1100

Have a good day! Koert

On Sun, Aug 26, 2012 at 2:21 PM, Harsh J <harsh@cloudera.com> wrote:

> Yes that is true, it does maintain N events in memory and then flushes
> them down to disk upon closure. With a reasonable size (2 MB of logs
> say) I don't see that causing any memory fill-up issues at all, since
> it does cap (and discard at tail).
>
> The other alternative may be to switch down the log level on the task,
> via mapred.map.child.log.level and/or mapred.reduce.child.log.level
> set to WARN or ERROR.
>
> On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <koert@tresata.com> wrote:
> > Looks like mapred.userlog.limit.kb is implemented by keeping some list in
> > memory, and the logs are not writting to disk until the job finishes or
> is
> > killed. That doesn't sound acceptable to me.
> >
> > Well i am not the only one with this problem. See MAPREDUCE-1100
> >
> >
> > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Hi Koert,
> >>
> >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <koert@tresata.com>
> wrote:
> >> > Hey Harsh,
> >> > Thanks for responding!
> >> > Would limiting the logging for each task via mapred.userlog.limit.kb
> be
> >> > strictly enforced (while the job is running)? That would solve my
> issue
> >> > of
> >> > runaway logging on a job filling up the datanode disks. I would set
> the
> >> > limit high since in general i do want to retain logs, just not in
> case a
> >> > single rogue job starts producing many gigabytes of logs.
> >> > Thanks!
> >>
> >> It is not strictly enforced such as counter limits are. Exceeding it
> >> wouldn't fail the task, only cause the extra logged events to not
> >> appear at all (thereby limiting the size).
> >>
> >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <harsh@cloudera.com> wrote:
> >> >>
> >> >> Hi Koert,
> >> >>
> >> >> To answer on point, there is no turning off this feature.
> >> >>
> >> >> Since you don't seem to care much for logs from tasks persisting,
> >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower
> >> >> value than 24 hours (such as 1h)? Or you may even limit the logging
> >> >> from each task to a certain amount of KB via mapred.userlog.limit.kb,
> >> >> which is unlimited by default.
> >> >>
> >> >> Would either of these work for you?
> >> >>
> >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <koert@tresata.com>
> >> >> wrote:
> >> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to
> >> >> > the
> >> >> > same
> >> >> > disk as where the OS is. So if that disks goes then i don't really
> >> >> > care
> >> >> > about tasktrackers failing. Also, the fact that logs were written
> to
> >> >> > a
> >> >> > single partition meant that i could make sure they would not grow
> too
> >> >> > large
> >> >> > in case someone had too verbose logging on a large job. With
> >> >> > MAPREDUCE-2415
> >> >> > a job that does massive amount of logging can fill up all the
> >> >> > mapred.local.dir, which in our case are on the same partition as
> the
> >> >> > hdfs
> >> >> > data dirs, so now faulty logging can fill up hdfs storage, which i
> >> >> > really
> >> >> > don't like. Any ideas?
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

--90e6ba6e8d680d3a7004c82f88cf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<font face=3D"courier new,monospace">Harsh,<br><br>I see the problem as fol=
lows: Usually we want to have people log what they want, as long as they do=
n&#39;t threaten the stability of the system. <br><br>However every once in=
 a while somebody will submit a job that is overly verbose and will generat=
e many gigabytes of logs in minutes. This is typically a honest mistake, an=
d the person doesn&#39;t realize what is going on (why is my job so slow?).=
 Limiting the general logging levels for everyone to deal with these mistak=
es seems ineffective. Telling the person to change the logging level for hi=
s job will not work either since he/she doesn&#39;t realize what is going o=
n and certainly didn&#39;t know in advance.<br>
<br>So all i really want is a very high and hard limit on the log size per =
job, to protect the system. Say many hundreds of megabytes or even gigabyte=
s. But when this limit is reached i want to logging to stop from that point=
 on, or even the job to be killed. mapred.userlog.limit.kb seems the wrong =
tool for the job.<br>
<br>Before the logging got moved to the mapred.local.dir i had a limit simp=
ly by limiting the size of the partition that logging went to.<br><br>Anyho=
w, looks like i will have to wait for MAPRED-1100<br><br>Have a good day! K=
oert<br>
</font><br><div class=3D"gmail_quote">On Sun, Aug 26, 2012 at 2:21 PM, Hars=
h J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" target=3D"_=
blank">harsh@cloudera.com</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">
Yes that is true, it does maintain N events in memory and then flushes<br>
them down to disk upon closure. With a reasonable size (2 MB of logs<br>
say) I don&#39;t see that causing any memory fill-up issues at all, since<b=
r>
it does cap (and discard at tail).<br>
<br>
The other alternative may be to switch down the log level on the task,<br>
via mapred.map.child.log.level and/or mapred.reduce.child.log.level<br>
set to WARN or ERROR.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers &lt;<a href=3D"mailto:koert=
@tresata.com">koert@tresata.com</a>&gt; wrote:<br>
&gt; Looks like mapred.userlog.limit.kb is implemented by keeping some list=
 in<br>
&gt; memory, and the logs are not writting to disk until the job finishes o=
r is<br>
&gt; killed. That doesn&#39;t sound acceptable to me.<br>
&gt;<br>
&gt; Well i am not the only one with this problem. See MAPREDUCE-1100<br>
&gt;<br>
&gt;<br>
&gt; On Sun, Aug 26, 2012 at 1:58 PM, Harsh J &lt;<a href=3D"mailto:harsh@c=
loudera.com">harsh@cloudera.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi Koert,<br>
&gt;&gt;<br>
&gt;&gt; On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers &lt;<a href=3D"mai=
lto:koert@tresata.com">koert@tresata.com</a>&gt; wrote:<br>
&gt;&gt; &gt; Hey Harsh,<br>
&gt;&gt; &gt; Thanks for responding!<br>
&gt;&gt; &gt; Would limiting the logging for each task via mapred.userlog.l=
imit.kb be<br>
&gt;&gt; &gt; strictly enforced (while the job is running)? That would solv=
e my issue<br>
&gt;&gt; &gt; of<br>
&gt;&gt; &gt; runaway logging on a job filling up the datanode disks. I wou=
ld set the<br>
&gt;&gt; &gt; limit high since in general i do want to retain logs, just no=
t in case a<br>
&gt;&gt; &gt; single rogue job starts producing many gigabytes of logs.<br>
&gt;&gt; &gt; Thanks!<br>
&gt;&gt;<br>
&gt;&gt; It is not strictly enforced such as counter limits are. Exceeding =
it<br>
&gt;&gt; wouldn&#39;t fail the task, only cause the extra logged events to =
not<br>
&gt;&gt; appear at all (thereby limiting the size).<br>
&gt;&gt;<br>
&gt;&gt; &gt; On Sun, Aug 26, 2012 at 1:44 PM, Harsh J &lt;<a href=3D"mailt=
o:harsh@cloudera.com">harsh@cloudera.com</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Hi Koert,<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; To answer on point, there is no turning off this feature.=
<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Since you don&#39;t seem to care much for logs from tasks=
 persisting,<br>
&gt;&gt; &gt;&gt; perhaps consider lowering the mapred.userlog.retain.hours=
 to a lower<br>
&gt;&gt; &gt;&gt; value than 24 hours (such as 1h)? Or you may even limit t=
he logging<br>
&gt;&gt; &gt;&gt; from each task to a certain amount of KB via mapred.userl=
og.limit.kb,<br>
&gt;&gt; &gt;&gt; which is unlimited by default.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Would either of these work for you?<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers &lt;<a hr=
ef=3D"mailto:koert@tresata.com">koert@tresata.com</a>&gt;<br>
&gt;&gt; &gt;&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt; We have smaller nodes (4 to 6 disks), and we used to=
 write logs to<br>
&gt;&gt; &gt;&gt; &gt; the<br>
&gt;&gt; &gt;&gt; &gt; same<br>
&gt;&gt; &gt;&gt; &gt; disk as where the OS is. So if that disks goes then =
i don&#39;t really<br>
&gt;&gt; &gt;&gt; &gt; care<br>
&gt;&gt; &gt;&gt; &gt; about tasktrackers failing. Also, the fact that logs=
 were written to<br>
&gt;&gt; &gt;&gt; &gt; a<br>
&gt;&gt; &gt;&gt; &gt; single partition meant that i could make sure they w=
ould not grow too<br>
&gt;&gt; &gt;&gt; &gt; large<br>
&gt;&gt; &gt;&gt; &gt; in case someone had too verbose logging on a large j=
ob. With<br>
&gt;&gt; &gt;&gt; &gt; MAPREDUCE-2415<br>
&gt;&gt; &gt;&gt; &gt; a job that does massive amount of logging can fill u=
p all the<br>
&gt;&gt; &gt;&gt; &gt; mapred.local.dir, which in our case are on the same =
partition as the<br>
&gt;&gt; &gt;&gt; &gt; hdfs<br>
&gt;&gt; &gt;&gt; &gt; data dirs, so now faulty logging can fill up hdfs st=
orage, which i<br>
&gt;&gt; &gt;&gt; &gt; really<br>
&gt;&gt; &gt;&gt; &gt; don&#39;t like. Any ideas?<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; --<br>
&gt;&gt; &gt;&gt; Harsh J<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Harsh J<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br>

--90e6ba6e8d680d3a7004c82f88cf--