Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: giraph-user@incubator.apache.org
Message-ID: <4F7785FA.20506@apache.org>
Date: Sat, 31 Mar 2012 15:32:26 -0700
From: Avery Ching <aching@apache.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: giraph-user@incubator.apache.org
Subject: Re: Incomplete output when running PageRank example
References: 
 <CAAro7RDknHt8thFKLikexdKgO2CrdjvyGsdaFB6A+oiW5sqKmQ@mail.gmail.com>
 <81366E6A-43F9-45B9-98AB-725983047119@deri.org>
 <CAAro7RAHJVdwod2noOGWfs=_OHqO1C70QftoOxzuLrnaTF2KNA@mail.gmail.com>
In-Reply-To: 
 <CAAro7RAHJVdwod2noOGWfs=_OHqO1C70QftoOxzuLrnaTF2KNA@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------010903060308050503070408"

This is a multi-part message in MIME format.
--------------010903060308050503070408
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

As Benjamin mentioned, it depends on the number of map tasks your hadoop 
install is running with.  You could set it proportionally to the number 
of cores it has if you like, but try using Benjamin's suggestions to get 
it working with more map tasks.  I believe if you don't set the default, 
the default is 2, which is not enough for 2 workers.

Avery

On 3/31/12 11:51 AM, Robert Davis wrote:
> Thanks a lot, Benjamin.
>
> I set the number of maptask as 2 since I only have a duo-core 
> processor (though with hyperthread) on my laptop. I ran it again but 
> it still appeared incorrect. The output is as follows.
>
> Regards,
> Robert
>
> $ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 
> -w 2
> 12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class 
> org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark
> 12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs: 
> ImmutableOutputCommiter will not check anything
> 12/03/31 11:40:11 INFO mapred.JobClient: Running job: 
> job_201203301834_0004
> 12/03/31 11:40:12 INFO mapred.JobClient:  map 0% reduce 0%
> 12/03/31 11:40:38 INFO mapred.JobClient:  map 33% reduce 0%
> 12/03/31 11:45:44 INFO mapred.JobClient: Job complete: 
> job_201203301834_0004
> 12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5
> 12/03/31 11:45:44 INFO mapred.JobClient:   Job Counters
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=620769
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all 
> reduces waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all 
> maps waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Launched map tasks=2
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=4377
>
> On Sat, Mar 31, 2012 at 3:45 AM, Benjamin Heitmann 
> <benjamin.heitmann@deri.org <mailto:benjamin.heitmann@deri.org>> wrote:
>
>
>     Hi Robert,
>
>     On 31 Mar 2012, at 09:42, Robert Davis wrote:
>
>     > Hello Giraphers,
>     >
>     > I am new to Giraph. I just check out a version and ran it in the
>     single
>     > machine mode. I got the following results which has no Giraph
>     counter
>     > information (as those in the example output). I am wondering
>     what has gone
>     > wrong. The hadoop I am using is 1.0
>
>     it looks like your Giraph job did not actually finish the calculation.
>
>     As you say that you are new to Giraph, there might be a high
>     chance that you ran into the same issue which tripped me up a few
>     weeks ago ;)
>
>     (I am not sure where the following information should be documented,
>     maybe this issue should be documented on the same page which
>     describes how to run the pagerank benchmark)
>
>     You provide the parameter "-w 30" to your job, which means that it
>     will use 30 workers. Maybe thats from the example on the Giraph
>     web page,
>     however there is one very important caveat for the number of workers:
>     the number of workers needs to be smaller then
>     mapred.tasktracker.map.tasks.maximum minus one.
>
>     Giraph will use one mapper task to start some sort of coordinating
>     worker (probably something zookeeper specific),
>     and then it will start the number of workers which you specified
>     using -w . If the total number of workers is bigger then the
>     maximum number of tasks,
>     then your Giraph job will not finish actually calculating stuff.
>     (There might be a config option for specifying how many workers
>     need to be finished in order to start the next superstep,
>     but I did not try that personally.)
>
>     If you are running Hadoop/Giraph on your personal machine, then I
>     would recommend, using 3 workers, and you should edit your
>     conf/mapred-site.xml
>     to include some values for the following configuration parameters
>     (and restart hadoop...)
>
>     <property>
>     <name>mapred.map.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.reduce.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>4</value>
>     </property>
>
>
>


--------------010903060308050503070408
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    As Benjamin mentioned, it depends on the number of map tasks your
    hadoop install is running with.&nbsp; You could set it proportionally to
    the number of cores it has if you like, but try using Benjamin's
    suggestions to get it working with more map tasks.&nbsp; I believe if you
    don't set the default, the default is 2, which is not enough for 2
    workers.<br>
    <br>
    Avery<br>
    <br>
    On 3/31/12 11:51 AM, Robert Davis wrote:
    <blockquote
cite="mid:CAAro7RAHJVdwod2noOGWfs=_OHqO1C70QftoOxzuLrnaTF2KNA@mail.gmail.com"
      type="cite">Thanks a lot, Benjamin.<br>
      <br>
      I set the number of maptask as 2 since I only have a duo-core
      processor (though with hyperthread) on my laptop. I ran it again
      but it still appeared incorrect. The output is as follows.<br>
      <br>
      Regards,<br>
      Robert<br>
      <br>
      $ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
      org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V
      50000000 -w 2<br>
      12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class
      org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark<br>
      12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs:
      ImmutableOutputCommiter will not check anything<br>
      12/03/31 11:40:11 INFO mapred.JobClient: Running job:
      job_201203301834_0004<br>
      12/03/31 11:40:12 INFO mapred.JobClient:&nbsp; map 0% reduce 0%<br>
      12/03/31 11:40:38 INFO mapred.JobClient:&nbsp; map 33% reduce 0%<br>
      12/03/31 11:45:44 INFO mapred.JobClient: Job complete:
      job_201203301834_0004<br>
      12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5<br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp; Job Counters <br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp;&nbsp;&nbsp;
      SLOTS_MILLIS_MAPS=620769<br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp;&nbsp;&nbsp; Total time spent by
      all reduces waiting after reserving slots (ms)=0<br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp;&nbsp;&nbsp; Total time spent by
      all maps waiting after reserving slots (ms)=0<br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp;&nbsp;&nbsp; Launched map tasks=2<br>
      12/03/31 11:45:44 INFO mapred.JobClient:&nbsp;&nbsp;&nbsp;&nbsp;
      SLOTS_MILLIS_REDUCES=4377<br>
      <br>
      <div class="gmail_quote">On Sat, Mar 31, 2012 at 3:45 AM, Benjamin
        Heitmann <span dir="ltr">&lt;<a moz-do-not-send="true"
            href="mailto:benjamin.heitmann@deri.org">benjamin.heitmann@deri.org</a>&gt;</span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
          Hi Robert,<br>
          <div class="im"><br>
            On 31 Mar 2012, at 09:42, Robert Davis wrote:<br>
            <br>
            &gt; Hello Giraphers,<br>
            &gt;<br>
            &gt; I am new to Giraph. I just check out a version and ran
            it in the single<br>
            &gt; machine mode. I got the following results which has no
            Giraph counter<br>
            &gt; information (as those in the example output). I am
            wondering what has gone<br>
            &gt; wrong. The hadoop I am using is 1.0<br>
            <br>
          </div>
          it looks like your Giraph job did not actually finish the
          calculation.<br>
          <br>
          As you say that you are new to Giraph, there might be a high
          chance that you ran into the same issue which tripped me up a
          few weeks ago ;)<br>
          <br>
          (I am not sure where the following information should be
          documented,<br>
          maybe this issue should be documented on the same page which
          describes how to run the pagerank benchmark)<br>
          <br>
          You provide the parameter "-w 30" to your job, which means
          that it will use 30 workers. Maybe thats from the example on
          the Giraph web page,<br>
          however there is one very important caveat for the number of
          workers:<br>
          the number of workers needs to be smaller then
          mapred.tasktracker.map.tasks.maximum minus one.<br>
          <br>
          Giraph will use one mapper task to start some sort of
          coordinating worker (probably something zookeeper specific),<br>
          and then it will start the number of workers which you
          specified using -w . If the total number of workers is bigger
          then the maximum number of tasks,<br>
          then your Giraph job will not finish actually calculating
          stuff.<br>
          (There might be a config option for specifying how many
          workers need to be finished in order to start the next
          superstep,<br>
          but I did not try that personally.)<br>
          <br>
          If you are running Hadoop/Giraph on your personal machine,
          then I would recommend, using 3 workers, and you should edit
          your conf/mapred-site.xml<br>
          to include some values for the following configuration
          parameters (and restart hadoop...)<br>
          <br>
          &nbsp;&lt;property&gt;<br>
          &nbsp; &nbsp;&lt;name&gt;mapred.map.tasks&lt;/name&gt;<br>
          &nbsp; &nbsp;&lt;value&gt;4&lt;/value&gt;<br>
          &nbsp;&lt;/property&gt;<br>
          &nbsp;&lt;property&gt;<br>
          &nbsp; &nbsp;&lt;name&gt;mapred.reduce.tasks&lt;/name&gt;<br>
          &nbsp; &nbsp;&lt;value&gt;4&lt;/value&gt;<br>
          &nbsp;&lt;/property&gt;<br>
          &nbsp;&lt;property&gt;<br>
          &nbsp;
          &nbsp;&lt;name&gt;mapred.tasktracker.map.tasks.maximum&lt;/name&gt;<br>
          &nbsp; &nbsp;&lt;value&gt;4&lt;/value&gt;<br>
          &nbsp;&lt;/property&gt;<br>
          &nbsp;&lt;property&gt;<br>
          &nbsp;
          &nbsp;&lt;name&gt;mapred.tasktracker.reduce.tasks.maximum&lt;/name&gt;<br>
          &nbsp; &nbsp;&lt;value&gt;4&lt;/value&gt;<br>
          &nbsp;&lt;/property&gt;<br>
          <br>
          <br>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>

--------------010903060308050503070408--