giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Molek <mpmo...@gmail.com>
Subject Modifying a benchmark to use real input
Date Thu, 23 May 2013 21:22:07 GMT
Hi,

I'm just getting started with Giraph, and struggling a bit to understand
what exactly is needed to run a minimal Giraph computation on real data,
rather than the PseudoRandomVertexInputFormat.

Apologies if this is covered somewhere in the docs or mailing list
archives. I looked but couldn't find anything applying to the current
version, and I couldn't figure out exactly how things have changed through
the versions. Some older code that I tried was clearly incompatible with
the current version.

Trying to learn by example, I copied the current
o.a.g.benchmark.ShortestPathsBenchmark and
o.a.g.benchmark.ShortestPathsComputation into my own project, and modified
them to run on their own without GiraphBenchmark, and BenchmarkOption. Here
is the new ShortestPathsBenchmark I ended up with:
http://pastebin.com/h3rH6jTm

When using the PseudoRandomVertexInputFormat, and some hard coded options
for aggregateVertices and edgesPerVertex, this runs fine from my jar with
the command:

hadoop jar giraph-testing-jar-with-dependencies.jar
modified_benchmarks.ShortestPathsBenchmark --workers 10

Now I'd like to use JsonLongDoubleFloatDoubleVertexInputFormat with some
real data, but I see no way to specify the input path. If this was plain
hadoop, I'd expect to be able to say something like
JsonLongDoubleFloatDoubleVertexInputFormat.addInputPath(job, new
Path("/some/path"));

That's not available though. Could someone point me in the right direction
with this?

Am I going about this all wrong?

Thanks for any help,
Matt

Mime
View raw message