giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: PageRankBenchmark on Yarn
Date Mon, 15 Jul 2013 18:07:17 GMT
Hi Chaun,

The benchmarks do not run against the YARN profile as they have a different
startup code path than examples and application do. You will want to run
the SimplePageRankComputation (etc) instead, and generate your own input
data. I can send you a silly generator I wrote in Scala that I use  for
quick and dirty testing if you like. But its silly, almost any other data
source you come up with will be better/as good :)

GiraphJob is confusing because if you look at GiraphRunner we sort of
"hijack" the GiraphJob into a YARN job using munge flags (see the var at
the bottom of GiraphRunner listing called "job")

If you see Mappers in your stack trace or logs, you are probably dealing
with a run based on MRv2 on top of YARN cluster (so Giraph still thinks its
running on Mappers, and in this case it will be!)



On Mon, Jul 8, 2013 at 3:06 PM, Chuan Lei <leichuan@gmail.com> wrote:

> Hello everyone,
>
> I have a few questions regarding running PageRankBenchmark on Yarn
> (2.0.5-alpha) cluster. I ran PageRankBenchmark with the following command.
>
> =====
> hadoop jar
> /export/home/clei/giraph-1.0.0/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000000 -w 3
> =====
>
> 1. It seems to me that this is still submitted as a GiraphJob, but not a
> job submitted from GiraphYarnClient. If so, the PageRankBenchmark are still
> hosted in mappers rather than containers. Am I correct? If I am correct,
> how can I actually run the benchmark as a Yarn application?
>
> 2. The PageRankBenchmark doesn't consume neither input nor output path
> from the command line. I was wondering how Giraph generates all 5 million
> vertices according to the above command (-V 5000000). Moreover, from the
> log files, it seems that each work tries to load all 5 million vertices at
> the beginning instead of 1/3 of these vertices. In this case, why each work
> consumes all inputs instead of only taking a split of the input? It is not
> the case in the SimpleShortestPath example.
>
> Any inputs on the above questions would be greatly appreciated.
>
> Regards,
> Chuan
>
>
>

Mime
View raw message