Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of renato.moutinho@gmail.com
 designates 74.125.82.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <5431BDBE.6050003@ulul.org>
References: 
 <CA+1XQnkmFfUwjX_PEyaaD=Lv+TDWvFyzbLGctNrhBtst2imwUQ@mail.gmail.com>
	<SNT149-W131DF6D5A76645A25F20CED0A40@phx.gbl>
	<-4690291296535386739@unknownmsgid>
	<5431BDBE.6050003@ulul.org>
Date: Mon, 6 Oct 2014 16:18:59 -0300
Message-ID: 
 <CA+1XQnm5SfC-r4ZADdnOB9Nw2wtPy+hP4dMBN4snaN3ZpAVtuA@mail.gmail.com>
Subject: Re: Reduce phase of wordcount
From: Renato Moutinho <renato.moutinho@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=f46d043bdeee71ca310504c5f45b

--f46d043bdeee71ca310504c5f45b
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi folks,

    just as a feeback: increasing mapred.tasktracker.reduce.tasks.maximum
had no effect (it was already set to 8) and the job created only 1 reducer
(my original scenario). However, adding mapred.reduce.tasks and setting to
some higher than 1 value (I=C2=B4ve set to 7) made hadoop spawn that much r=
educe
tasks (seven on my example) and the execution time went down to around 29
minutes (also, my servers are now frying cpu....) ! My next step (I=C2=B4m
pushing it to the maximum) is adding a combiner..

And no... I haven=C2=B4t setup this cluster just for running
wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)

Thanks a lot for your help !

Regards,

Renato Moutinho

2014-10-05 18:53 GMT-03:00 Ulul <hadoop@ulul.org>:

>  Hi
>
> You indicate that you have just one reducer, which is the default in
> Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
> You should increase mapred.reduce.tasks use combiners and maybe tune
> mapred.reduce.tasktracker.reduce.tasks.maximum
>
> Hope that helps
> Ulul
>
>  Le 05/10/2014 16:53, Renato Moutinho a =C3=A9crit :
>
> Hi there,
>
>       thanks a lot for taking the time to answer me ! Actually, this
> "issue" happens after all the map tasks have completed (I'm looking at th=
e
> web interface). I'll try to diagnose if it's an issue with the number of
> threads.. I suppose I'll have to change the logging configuration to find
> what's going on..
>
>  The only that's getting to me is the fact that the lines are repeated on
> the log..
>
>  Regards,
>
>  Renato Moutinho
>
>
>
> Em 05/10/2014, =C3=A0s 10:52, java8964 <java8964@hotmail.com> escreveu:
>
>   Don't be confused by 6.03 MB/s.
>
>  The relationship between mapper and reducer is M to N relationship,
> which means the mapper could send its data to all reducers, and one reduc=
er
> could receive its input from all mappers.
>
>  There could be a lot of reasons why you think the reduce copying phase
> is too slow. It could be the mappers are still running, there is no data
> generated for reducer to copy yet; or there is no enough threads in eithe=
r
> mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
> can google the hadoop configurations to adjust them.
>
>  But just because you can get 60M/s in scp, then complain only getting
> 6M/s in the log is not fair to hadoop. You one reducer needs to copy data
> from all the mappers, concurrently, makes it impossible to reach the same
> speed as one to one point network transfer speed.
>
>  The reducer stage is normally longer than map stage, as data HAS to be
> transferred through network.
>
>  But in word count example, the data needs to be transferred should be
> very small. You can ask the following question by yourself:
>
>  1) Should I use combiner in this case? (Yes, for word count, it reduces
> the data needs to be transferred).
> 2) Do I use all the reducers I can use, if my cluster is under utilized
> and I want my job to finish fast?
> 3) Can I add more threads in the task tracker to help? You need to dig
> into your log to find out if your mapper or reducer are waiting for the
> thread from thread pool.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 3 Oct 2014 18:40:16 -0300
> Subject: Reduce phase of wordcount
> From: renato.moutinho@gmail.com
> To: user@hadoop.apache.org
>
>   Hi people,
>
>      I=C2=B4m doing some experiments with hadoop 1.2.1 running the wordco=
unt
> sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
> configuration I=C2=B4ve been able to make the map phase run on 22 minutes=
..
> However the reduce phase (which consists of a single job) stucks at some
> points making the whole job take more than 40 minutes. Looking at the log=
s,
> I=C2=B4ve seen several lines stuck at copy on different moments, like thi=
s:
>
> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
>
>  Eventually the job end, but this information, being repeated, makes me
> think it=C2=B4s having difficulty transferring the parts from the map nod=
es. Is
> my interpretation correct on this ? The trasnfer rate is waaay too slow i=
f
> compared to scp file transfer between the hosts (10 times slower). Any
> takes on why ?
>
> Regards,
>
> Renato Moutinho
>
>
>

--f46d043bdeee71ca310504c5f45b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Hi folks,<br><br></div>=C2=A0=C2=A0=C2=A0 just a=
s a feeback: increasing mapred.tasktracker.reduce.tasks.maximum had no effe=
ct (it was already set to 8) and the job created only 1 reducer (my origina=
l scenario). However, adding mapred.reduce.tasks and setting to some higher=
 than 1 value (I=C2=B4ve set to 7) made hadoop spawn that much reduce tasks=
 (seven on my example) and the execution time went down to around 29 minute=
s (also, my servers are now frying cpu....) ! My next step (I=C2=B4m pushin=
g it to the maximum) is adding a combiner..<br><br></div><div>And no... I h=
aven=C2=B4t setup this cluster just for running wordcount.Kkkkkkkk.... I=
9;m still getting to know hadoop. :-)<br><br></div>Thanks a lot for your he=
lp !<br><br>Regards,<br><br>Renato Moutinho<br><div class=3D"gmail_extra"><=
br><div class=3D"gmail_quote">2014-10-05 18:53 GMT-03:00 Ulul <span dir=3D"=
ltr">&lt;<a href=3D"mailto:hadoop@ulul.org" target=3D"_blank">hadoop@ulul.o=
rg</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0=
 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div text=3D"#000000" bgcolor=3D"#FFFFFF">
    <font face=3D"Courier 10 Pitch">Hi<br>
      <br>
      You indicate that you have just one reducer, which is the default
      in Hadoop 1 but quite insufficient for a 7 slave nodes cluster. <br>
      You should increase mapred.reduce.tasks use combiners and maybe
      tune </font><font face=3D"Courier 10 Pitch">mapred.reduce.tasktracker=
.reduce.tasks.maximum<br>
      <br>
      Hope that helps<br>
      Ulul<br>
      <br>
    </font>
    <div>Le 05/10/2014 16:53, Renato Moutinho a
      =C3=A9crit=C2=A0:<br>
    </div><div><div>
    <blockquote type=3D"cite">
     =20
      <div>Hi there,</div>
      <div><br>
      </div>
      <div>=C2=A0 =C2=A0 =C2=A0thanks a lot for taking the time to answer m=
e !
        Actually, this &quot;issue&quot; happens after all the map tasks ha=
ve
        completed (I&#39;m looking at the web interface). I&#39;ll try to
        diagnose if it&#39;s an issue with the number of threads.. I suppos=
e
        I&#39;ll have to change the logging configuration to find what&#39;=
s
        going on..</div>
      <div><br>
      </div>
      <div>The only that&#39;s getting to me is the fact that the lines are
        repeated on the log..</div>
      <div><br>
      </div>
      <div>Regards,</div>
      <div><br>
      </div>
      <div>Renato Moutinho</div>
      <div><br>
        <br>
      </div>
      <div><br>
        Em 05/10/2014, =C3=A0s 10:52, java8964 &lt;<a href=3D"mailto:java89=
64@hotmail.com" target=3D"_blank">java8964@hotmail.com</a>&gt;
        escreveu:<br>
        <br>
      </div>
      <blockquote type=3D"cite">
        <div>
         =20
          <div dir=3D"ltr">Don&#39;t be confused by 6.03 MB/s.
            <div><br>
            </div>
            <div>The relationship between mapper and reducer is M to N
              relationship, which means the mapper could send its data
              to all reducers, and one reducer could receive its input
              from all mappers.</div>
            <div><br>
            </div>
            <div>There could be a lot of reasons why you think the
              reduce copying phase is too slow. It could be the mappers
              are still running, there is no data generated for reducer
              to copy yet; or there is no enough threads in either
              mapper or reducer to utilize remaining cpu/memory/network
              bandwidth. You can google the hadoop configurations to
              adjust them.</div>
            <div><br>
            </div>
            <div>But just because you can get 60M/s in scp, then
              complain only getting 6M/s in the log is not fair to
              hadoop. You one reducer needs to copy data from all the
              mappers, concurrently, makes it impossible to reach the
              same speed as one to one point network transfer speed.</div>
            <div><br>
            </div>
            <div>The reducer stage is normally longer than map stage, as
              data HAS to be transferred through network.</div>
            <div><br>
            </div>
            <div>But in word count example, the data needs to be
              transferred should be very small. You can ask the
              following question by yourself:</div>
            <div><br>
            </div>
            <div>1) Should I use combiner in this case? (Yes, for word
              count, it reduces the data needs to be transferred).</div>
            <div>2) Do I use all the reducers I can use, if my cluster
              is under utilized and I want my job to finish fast?</div>
            <div>3) Can I add more threads in the task tracker to help?
              You need to dig into your log to find out if your mapper
              or reducer are waiting for the thread from thread pool.</div>
            <div><br>
            </div>
            <div>Yong<br>
              <br>
              <div>
                <hr>Date: Fri, 3 Oct 2014 18:40:16
                -0300<br>
                Subject: Reduce phase of wordcount<br>
                From: <a href=3D"mailto:renato.moutinho@gmail.com" target=
=3D"_blank">renato.moutinho@gmail.com</a><br>
                To: <a href=3D"mailto:user@hadoop.apache.org" target=3D"_bl=
ank">user@hadoop.apache.org</a><br>
                <br>
                <div dir=3D"ltr">
                  <div>
                    <div>
                      <div>Hi people,<br>
                        <br>
                      </div>
                      =C2=A0=C2=A0=C2=A0 I=C2=B4m doing some experiments wi=
th hadoop 1.2.1
                      running the wordcount sample on an 8 nodes cluster
                      (master + 7 slaves). Tuning the tasks
                      configuration I=C2=B4ve been able to make the map pha=
se
                      run on 22 minutes.. However the reduce phase
                      (which consists of a single job) stucks at some
                      points making the whole job take more than 40
                      minutes. Looking at the logs, I=C2=B4ve seen several
                      lines stuck at copy on different moments, like
                      this:<br>
                      <br>
                      2014-10-03 18:26:34,717 INFO
                      org.apache.hadoop.mapred.TaskTracker:
                      attempt_201408281149_0019_r_000000_0 0.3302721%
                      reduce &gt; copy (971 of 980 at 6.03 MB/s) &gt;<br>
                      2014-10-03 18:26:37,736 INFO
                      org.apache.hadoop.mapred.TaskTracker:
                      attempt_201408281149_0019_r_000000_0 0.3302721%
                      reduce &gt; copy (971 of 980 at 6.03 MB/s) &gt;<br>
                      2014-10-03 18:26:40,754 INFO
                      org.apache.hadoop.mapred.TaskTracker:
                      attempt_201408281149_0019_r_000000_0 0.3302721%
                      reduce &gt; copy (971 of 980 at 6.03 MB/s) &gt;<br>
                      2014-10-03 18:26:43,772 INFO
                      org.apache.hadoop.mapred.TaskTracker:
                      attempt_201408281149_0019_r_000000_0 0.3302721%
                      reduce &gt; copy (971 of 980 at 6.03 MB/s) &gt;<br>
                      <br>
                    </div>
                    Eventually the job end, but this information, being
                    repeated, makes me think it=C2=B4s having difficulty
                    transferring the parts from the map nodes. Is my
                    interpretation correct on this ? The trasnfer rate
                    is waaay too slow if compared to scp file transfer
                    between the hosts (10 times slower). Any takes on
                    why ?<br>
                    <br>
                    Regards,<br>
                  </div>
                  <br>
                  Renato Moutinho<br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </blockquote>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div></div>

--f46d043bdeee71ca310504c5f45b--