Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2812698B4 for ; Sat, 31 Mar 2012 22:32:14 +0000 (UTC) Received: (qmail 45724 invoked by uid 500); 31 Mar 2012 22:32:14 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 45686 invoked by uid 500); 31 Mar 2012 22:32:14 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 45678 invoked by uid 99); 31 Mar 2012 22:32:14 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Mar 2012 22:32:13 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Mar 2012 22:32:13 +0000 Message-ID: <4F7785FA.20506@apache.org> Date: Sat, 31 Mar 2012 15:32:26 -0700 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: Incomplete output when running PageRank example References: <81366E6A-43F9-45B9-98AB-725983047119@deri.org> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010903060308050503070408" This is a multi-part message in MIME format. --------------010903060308050503070408 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit As Benjamin mentioned, it depends on the number of map tasks your hadoop install is running with. You could set it proportionally to the number of cores it has if you like, but try using Benjamin's suggestions to get it working with more map tasks. I believe if you don't set the default, the default is 2, which is not enough for 2 workers. Avery On 3/31/12 11:51 AM, Robert Davis wrote: > Thanks a lot, Benjamin. > > I set the number of maptask as 2 since I only have a duo-core > processor (though with hyperthread) on my laptop. I ran it again but > it still appeared incorrect. The output is as follows. > > Regards, > Robert > > $ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 > -w 2 > 12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class > org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark > 12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs: > ImmutableOutputCommiter will not check anything > 12/03/31 11:40:11 INFO mapred.JobClient: Running job: > job_201203301834_0004 > 12/03/31 11:40:12 INFO mapred.JobClient: map 0% reduce 0% > 12/03/31 11:40:38 INFO mapred.JobClient: map 33% reduce 0% > 12/03/31 11:45:44 INFO mapred.JobClient: Job complete: > job_201203301834_0004 > 12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5 > 12/03/31 11:45:44 INFO mapred.JobClient: Job Counters > 12/03/31 11:45:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=620769 > 12/03/31 11:45:44 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 12/03/31 11:45:44 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 12/03/31 11:45:44 INFO mapred.JobClient: Launched map tasks=2 > 12/03/31 11:45:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=4377 > > On Sat, Mar 31, 2012 at 3:45 AM, Benjamin Heitmann > > wrote: > > > Hi Robert, > > On 31 Mar 2012, at 09:42, Robert Davis wrote: > > > Hello Giraphers, > > > > I am new to Giraph. I just check out a version and ran it in the > single > > machine mode. I got the following results which has no Giraph > counter > > information (as those in the example output). I am wondering > what has gone > > wrong. The hadoop I am using is 1.0 > > it looks like your Giraph job did not actually finish the calculation. > > As you say that you are new to Giraph, there might be a high > chance that you ran into the same issue which tripped me up a few > weeks ago ;) > > (I am not sure where the following information should be documented, > maybe this issue should be documented on the same page which > describes how to run the pagerank benchmark) > > You provide the parameter "-w 30" to your job, which means that it > will use 30 workers. Maybe thats from the example on the Giraph > web page, > however there is one very important caveat for the number of workers: > the number of workers needs to be smaller then > mapred.tasktracker.map.tasks.maximum minus one. > > Giraph will use one mapper task to start some sort of coordinating > worker (probably something zookeeper specific), > and then it will start the number of workers which you specified > using -w . If the total number of workers is bigger then the > maximum number of tasks, > then your Giraph job will not finish actually calculating stuff. > (There might be a config option for specifying how many workers > need to be finished in order to start the next superstep, > but I did not try that personally.) > > If you are running Hadoop/Giraph on your personal machine, then I > would recommend, using 3 workers, and you should edit your > conf/mapred-site.xml > to include some values for the following configuration parameters > (and restart hadoop...) > > > mapred.map.tasks > 4 > > > mapred.reduce.tasks > 4 > > > mapred.tasktracker.map.tasks.maximum > 4 > > > mapred.tasktracker.reduce.tasks.maximum > 4 > > > > --------------010903060308050503070408 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit As Benjamin mentioned, it depends on the number of map tasks your hadoop install is running with.  You could set it proportionally to the number of cores it has if you like, but try using Benjamin's suggestions to get it working with more map tasks.  I believe if you don't set the default, the default is 2, which is not enough for 2 workers.

Avery

On 3/31/12 11:51 AM, Robert Davis wrote:
Thanks a lot, Benjamin.

I set the number of maptask as 2 since I only have a duo-core processor (though with hyperthread) on my laptop. I ran it again but it still appeared incorrect. The output is as follows.

Regards,
Robert

$ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 -w 2
12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark
12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
12/03/31 11:40:11 INFO mapred.JobClient: Running job: job_201203301834_0004
12/03/31 11:40:12 INFO mapred.JobClient:  map 0% reduce 0%
12/03/31 11:40:38 INFO mapred.JobClient:  map 33% reduce 0%
12/03/31 11:45:44 INFO mapred.JobClient: Job complete: job_201203301834_0004
12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5
12/03/31 11:45:44 INFO mapred.JobClient:   Job Counters
12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=620769
12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/03/31 11:45:44 INFO mapred.JobClient:     Launched map tasks=2
12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=4377

On Sat, Mar 31, 2012 at 3:45 AM, Benjamin Heitmann <benjamin.heitmann@deri.org> wrote:

Hi Robert,

On 31 Mar 2012, at 09:42, Robert Davis wrote:

> Hello Giraphers,
>
> I am new to Giraph. I just check out a version and ran it in the single
> machine mode. I got the following results which has no Giraph counter
> information (as those in the example output). I am wondering what has gone
> wrong. The hadoop I am using is 1.0

it looks like your Giraph job did not actually finish the calculation.

As you say that you are new to Giraph, there might be a high chance that you ran into the same issue which tripped me up a few weeks ago ;)

(I am not sure where the following information should be documented,
maybe this issue should be documented on the same page which describes how to run the pagerank benchmark)

You provide the parameter "-w 30" to your job, which means that it will use 30 workers. Maybe thats from the example on the Giraph web page,
however there is one very important caveat for the number of workers:
the number of workers needs to be smaller then mapred.tasktracker.map.tasks.maximum minus one.

Giraph will use one mapper task to start some sort of coordinating worker (probably something zookeeper specific),
and then it will start the number of workers which you specified using -w . If the total number of workers is bigger then the maximum number of tasks,
then your Giraph job will not finish actually calculating stuff.
(There might be a config option for specifying how many workers need to be finished in order to start the next superstep,
but I did not try that personally.)

If you are running Hadoop/Giraph on your personal machine, then I would recommend, using 3 workers, and you should edit your conf/mapred-site.xml
to include some values for the following configuration parameters (and restart hadoop...)

 <property>
   <name>mapred.map.tasks</name>
   <value>4</value>
 </property>
 <property>
   <name>mapred.reduce.tasks</name>
   <value>4</value>
 </property>
 <property>
   <name>mapred.tasktracker.map.tasks.maximum</name>
   <value>4</value>
 </property>
 <property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>
   <value>4</value>
 </property>




--------------010903060308050503070408--