giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuan Lei <leich...@gmail.com>
Subject PageRankBenchmark on Yarn
Date Mon, 08 Jul 2013 22:06:39 GMT
Hello everyone,

I have a few questions regarding running PageRankBenchmark on Yarn
(2.0.5-alpha) cluster. I ran PageRankBenchmark with the following command.

=====
hadoop jar
/export/home/clei/giraph-1.0.0/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000000 -w 3
=====

1. It seems to me that this is still submitted as a GiraphJob, but not a
job submitted from GiraphYarnClient. If so, the PageRankBenchmark are still
hosted in mappers rather than containers. Am I correct? If I am correct,
how can I actually run the benchmark as a Yarn application?

2. The PageRankBenchmark doesn't consume neither input nor output path from
the command line. I was wondering how Giraph generates all 5 million
vertices according to the above command (-V 5000000). Moreover, from the
log files, it seems that each work tries to load all 5 million vertices at
the beginning instead of 1/3 of these vertices. In this case, why each work
consumes all inputs instead of only taking a split of the input? It is not
the case in the SimpleShortestPath example.

Any inputs on the above questions would be greatly appreciated.

Regards,
Chuan

Mime
View raw message