giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magyar, Bence (US SSA)" <bence.mag...@baesystems.com>
Subject RE: High-level questions about the ShortestPathsBenchmark example that ships with Giraph
Date Mon, 08 Oct 2012 02:50:36 GMT
Hi Eli,

Well, I've followed your advice and have gotten the SimpleShortestPathsVertex example running
using the GiraphRunner.  I am using the following command to launch the job:

./giraph ../target/giraph.jar org.apache.giraph.examples.SimpleShortestPathsVertex -if org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexInputFormat
-ip /user/hduser/shortestPathsInputGraph -of org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexOutputFormat
-op /user/hduser/shortestPathsOutput -w 3

As my input graph, I am using Avery's sample input files from:

http://ece.northwestern.edu/~aching/shortestPathsInputGraph.tar.gz

After the job completes, my output matches the output referenced @ https://github.com/aching/Giraph/wiki/Shortest-Paths-Example

(See below)

hduser ~/giraph-src/bin $ hadoop fs -cat shortestPathsOutput/part*

[5,1000,[[6,500]]]
[14,9100,[[0,1400]]]
[8,2800,[[9,800]]]
[2,100,[[3,200]]]
[11,5500,[[12,1100]]]
[7,2100,[[8,700]]]
[1,0,[[2,100]]]
[10,4500,[[11,1000]]]
[13,7800,[[14,1300]]]
[4,600,[[5,400]]]
[6,1500,[[7,600]]]
[0,10500,[[1,0]]]
[9,3600,[[10,900]]]
[12,6600,[[13,1200]]]
[3,300,[[4,300]]]

However, I don't quite understand how to interpret this output.  The SimpleShortestPathsVertex
should find the shortest path from (by default) vertex "1" to every other node.
How should the above output be interpreted?  The ShortestPath example on the wiki also stops
short of explaining this point.  Am I using the wrong output format?

Thanks!
Bence
________________________________
From: Eli Reisman [mailto:apache.mailbox@gmail.com]
Sent: Monday, September 24, 2012 6:05 PM
To: user@giraph.apache.org
Subject: Re: High-level questions about the ShortestPathsBenchmark example that ships with
Giraph

Sorry, the Giraph website is a bit out of date regarding the user-configurable application
code. The benchmark applications are meant for just that, and are not written to process input
or output data or results. The code you are looking for is in the examples/ dir. These are
applications (*Vertex classes) in Giraph. Regarding code from the examples/ directory: you
can run it at the command line using the "giraph" script in the bin/ dir. There are many command
line options (including for IO formats from the io/ dir) and input/output paths in your HDFS
for your data. Until better docs are up (soon, sorry!) your best bet is to read some of the
example apps in the examples/ and io/ dirs and read GiraphRunner.java and bin/giraph script
to get a feel for how user-configured command-line runs are performed. You might also then
feel comfortable writing some of your own application code.

Sorry about the confusion, we will be posting better documentation of application-style runs
ASAP.
On Mon, Sep 24, 2012 at 1:50 PM, Magyar, Bence (US SSA) <bence.magyar@baesystems.com<mailto:bence.magyar@baesystems.com>>
wrote:
Hello Giraph User Community,

( I am re-posting this question - I think I tried posting this before I confirmed my registration.
 Please pardon if this message is a duplicate )

This is my first post to this mailing list - I'm interested in learning more about Giraph
and to do that I checked out the latest source code from https://svn.apache.org/repos/asf/giraph/trunk
and built it with maven.

I am now running the shortestPathBenchMark example that ships with Giraph and have a few "high-level"
questions:
For the sake of this discussion, I am running the example with the following arguments:

hadoop jar giraph.jar org.apache.giraph.benchmark.ShortestPathsBenchmark -c 1 -e 3 -v -V 50000
-w 4

The example takes about 90 seconds to complete on my 4-node hadoop cluster and I don't see
any errors or issues.


1.      In computing a Dijkstra shortest path, we are looking for the shortest path from one
node to another.  What does ShortestPathsBenchmark use as the "starting" node?  The "ending"
node?

2.      What edge weights are being used?  The arguments don't allow me to specify them.

3.      Does ShortestPathsBenchmark produce any output data inside HDFS upon completion of
this example, or is the example purely meant to visually illustrate processing time on my
cluster?

4.      Can I feed ShortestPathsBenchmark my own graph?

5.      In the example above, I have specified 3 edges per vertex.  If I were to specify only
2 edges per vertex, am I not effectively dealing with a graph that most closely resembles
a "linked list"?  When I set -e=2, the processing time is still somewhat comparable to -e
= 3.  Shouldn't the graph be much simpler?


I have seen the ShortestPathExample @
https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example

and I was planning on working through that example as well, but I thought I'd ask about the
benchmarking example first.

Thanks!


Bence


Mime
View raw message