giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: High-level questions about the ShortestPathsBenchmark example that ships with Giraph
Date Thu, 11 Oct 2012 17:13:16 GMT
Looking at the code for the benchmarks/ and the examples/ version, you can
see that the current minDistance (the V value) is set in each superstep
according to the minimum distances from Source received from all neighbors.
If a neighbor's minDistance gets updated, it propagates the "news" to all
its neighbors via its own out-edges, adding the cost of each edge traversal
to its own minDistance so each neighbor can evaluate the cost of being
passed through from Source to itself from that edge, and record in its own
minDistance any better results than it already stores. So, the vertex value
should be the distance from source for each vertex in the graph when the
supersteps have completed.

I think the problem here is the output format. Figure out which value is
the V (vertex value) in the output, and you'll have your answer. What this
doesn't seem to do is give you the chance to backtrack a path from V to
Source, you just know the best cost-to-Source from each V.

If anyone else can confirm/deny these allegations, please jump in. Thanks!


On Wed, Oct 10, 2012 at 7:53 PM, Magyar, Bence (US SSA) <
bence.magyar@baesystems.com> wrote:

>  Hi Eli,
> Unfortunately that interpretation of the data doesn't work. Avery's
> example input simply defines a graph that looks like a circular linked
> list. The shortest path from node 14 to node 0 is 1400, but the shortest
> path from node 5 to node 0 is not 1000- it will be the sum of all edge
> costs between node 5,6,7 etc all the way through 14 until the graph wraps
> around back to node 0. The resulting output looks exactly like the input.
> That can't be right.
>
> Is there any way the documented example on the wiki could be updated to
> briefly describe how to interpret the result? I can't imagine I'm the only
> one having trouble understanding this...
>
> I am going to try experimenting with even simpler graphs to see if I cant
> figure out how this is all working.
>
> Thanks!
>
>
>
> Sent from phone
>
>
> -----Original Message-----
> *From: *Eli Reisman [apache.mailbox@gmail.com]
> *Sent: *Wednesday, October 10, 2012 05:34 PM Eastern Standard Time
> *To: *user@giraph.apache.org
> *Subject: *Re: High-level questions about the ShortestPathsBenchmark
> example that ships with Giraph
>
> I think maybe the problem is the interpretation of the printout, which
> looks like it includes the out-edge ID's and values as well as the vertex
> value. I think the first two numbers in each outer bracket are the ID and
> VALUE (I and V generic types from application code) you want to read, the
> other stuff should be just as it was in the input as those are (I think)
> edge id's and values (E generic type from you Vertex application code)
>
> So the V (the 2nd value in the outer brackets for each printed line)
> should be the cost to get to your Source vertex from that vertex.
>
> If not, I need to look back at the IO Format and the example code, I'm
> just guessing based on memory.
> On Tue, Oct 9, 2012 at 5:32 AM, Magyar, Bence (US SSA) <
> bence.magyar@baesystems.com> wrote:
>
>>  Hi Eli, ****
>>
>> ** **
>>
>> I’ve found the past email I think you are referring to.  I found an email
>> from Friday 27 July, 2012.  In his email, Amir A. posed a similar question
>> about how to interpret the output from the ShortestPath Example.  Amir
>> wrote:****
>>
>> ** **
>>
>> “Moreover, looking at  the output file of shortestpath example provided**
>> **
>>
>> on the website, it seems that it is exactly the same like input files****
>>
>> combined together.****
>>
>> ** **
>>
>> Both the output file and the input files (put together) look a like.****
>>
>> Is this correct and the expected result?”****
>>
>> ** **
>>
>> Unfortunately, I don’t see a response to his email in the archives.  Can
>> you please quickly explain how the ShortestPath output should be
>> interpreted?****
>>
>> ** **
>>
>> Let’s just look at a quick snippet of the output:****
>>
>> hduser ~/giraph-src/bin $ hadoop fs -cat shortestPathsOutput/part*****
>>
>> [5,1000,[[6,500]]]****
>>
>> [14,9100,[[0,1400]]]****
>>
>> ** **
>>
>> I know that this is:  node 5 (with value 14) has an edge to node 6 with
>> edge cost 500.****
>>
>> And:  node 14 (with value 9100) has an edge to node 0 with an edge cost
>> of 1400.****
>>
>> ** **
>>
>> But if the ShortestPathExample computes the shortest path from node 0 to
>> every other node, I am expecting to see output like:****
>>
>> ** **
>>
>> [0, 10500, [1, <edge-cost-to-1> ]****
>>
>> [0, 10500, [2, <edge-cost-to-2> ]****
>>
>> [0, 10500, [3, <edge-cost-to-3> ]****
>>
>> Etc…****
>>
>> ** **
>>
>> Am I thinking about this the right way?****
>>
>> ** **
>>
>> Bence****
>>
>> ****
>>
>> *From:* Eli Reisman [mailto:apache.mailbox@gmail.com]
>> *Sent:* Monday, October 08, 2012 6:48 PM
>>
>> *To:* user@giraph.apache.org
>> *Subject:* Re: High-level questions about the ShortestPathsBenchmark
>> example that ships with Giraph****
>>
>>  ** **
>>
>> No you've got it right. There's another mail from a few months back on
>> the mailing list that explains the output. I'm glad to see its working for
>> you!
>>
>> Eli
>>
>> ****
>>
>> On Sun, Oct 7, 2012 at 7:50 PM, Magyar, Bence (US SSA) <
>> bence.magyar@baesystems.com> wrote:****
>>
>> Hi Eli, ****
>>
>>  ****
>>
>> Well, I’ve followed your advice and have gotten the
>> SimpleShortestPathsVertex example running using the GiraphRunner.  I am
>> using the following command to launch the job:****
>>
>>  ****
>>
>> ./giraph ../target/giraph.jar
>> org.apache.giraph.examples.SimpleShortestPathsVertex -if
>> org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexInputFormat -ip
>> /user/hduser/shortestPathsInputGraph -of
>> org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexOutputFormat -op
>> /user/hduser/shortestPathsOutput -w 3****
>>
>>  ****
>>
>> As my input graph, I am using Avery’s sample input files from:****
>>
>>  ****
>>
>> http://ece.northwestern.edu/~aching/shortestPathsInputGraph.tar.gz****
>>
>>  ****
>>
>> After the job completes, my output matches the output referenced @
>> https://github.com/aching/Giraph/wiki/Shortest-Paths-Example****
>>
>>  ****
>>
>> (See below)****
>>
>>  ****
>>
>> hduser ~/giraph-src/bin $ hadoop fs -cat shortestPathsOutput/part*****
>>
>>  ****
>>
>> [5,1000,[[6,500]]]****
>>
>> [14,9100,[[0,1400]]]****
>>
>> [8,2800,[[9,800]]]****
>>
>> [2,100,[[3,200]]]****
>>
>> [11,5500,[[12,1100]]]****
>>
>> [7,2100,[[8,700]]]****
>>
>> [1,0,[[2,100]]]****
>>
>> [10,4500,[[11,1000]]]****
>>
>> [13,7800,[[14,1300]]]****
>>
>> [4,600,[[5,400]]]****
>>
>> [6,1500,[[7,600]]]****
>>
>> [0,10500,[[1,0]]]****
>>
>> [9,3600,[[10,900]]]****
>>
>> [12,6600,[[13,1200]]]****
>>
>> [3,300,[[4,300]]]****
>>
>>  ****
>>
>> However, I don’t quite understand how to interpret this output.  The
>> SimpleShortestPathsVertex should find the shortest path from (by default)
>> vertex “1” to every other node.****
>>
>> How should the above output be interpreted?  The ShortestPath example on
>> the wiki also stops short of explaining this point.  Am I using the wrong
>> output format?  ****
>>
>>  ****
>>
>> Thanks!****
>>
>> Bence****
>>  ------------------------------
>>
>> *From:* Eli Reisman [mailto:apache.mailbox@gmail.com]
>> *Sent:* Monday, September 24, 2012 6:05 PM
>> *To:* user@giraph.apache.org
>> *Subject:* Re: High-level questions about the ShortestPathsBenchmark
>> example that ships with Giraph****
>>
>>  ****
>>
>> Sorry, the Giraph website is a bit out of date regarding the
>> user-configurable application code. The benchmark applications are meant
>> for just that, and are not written to process input or output data or
>> results. The code you are looking for is in the examples/ dir. These are
>> applications (*Vertex classes) in Giraph. Regarding code from the examples/
>> directory: you can run it at the command line using the "giraph" script in
>> the bin/ dir. There are many command line options (including for IO formats
>> from the io/ dir) and input/output paths in your HDFS for your data. Until
>> better docs are up (soon, sorry!) your best bet is to read some of the
>> example apps in the examples/ and io/ dirs and read GiraphRunner.java and
>> bin/giraph script to get a feel for how user-configured command-line runs
>> are performed. You might also then feel comfortable writing some of your
>> own application code.
>>
>> Sorry about the confusion, we will be posting better documentation of
>> application-style runs ASAP.****
>>
>> On Mon, Sep 24, 2012 at 1:50 PM, Magyar, Bence (US SSA) <
>> bence.magyar@baesystems.com> wrote:****
>>
>> Hello Giraph User Community,****
>>
>>  ****
>>
>> *( I am re-posting this question – I think I tried posting this before I
>> confirmed my registration.  Please pardon if this message is a duplicate )
>> *****
>>
>>  ****
>>
>> This is my first post to this mailing list – I’m interested in learning
>> more about Giraph and to do that I checked out the latest source code from
>> https://svn.apache.org/repos/asf/giraph/trunk****
>>
>> and built it with maven.****
>>
>>  ****
>>
>> I am now running the shortestPathBenchMark example that ships with Giraph
>> and have a few “high-level” questions:****
>>
>> For the sake of this discussion, I am running the example with the
>> following arguments:****
>>
>>  ****
>>
>> hadoop jar giraph.jar org.apache.giraph.benchmark.ShortestPathsBenchmark
>> -c 1 -e 3 -v -V 50000 -w 4****
>>
>>  ****
>>
>> The example takes about 90 seconds to complete on my 4-node hadoop
>> cluster and I don’t see any errors or issues. ****
>>
>>  ****
>>
>> 1.      In computing a Dijkstra shortest path, we are looking for the
>> shortest path from one node to another.  What does ShortestPathsBenchmark
>> use as the “starting” node?  The “ending” node?****
>>
>> 2.      What edge weights are being used?  The arguments don’t allow me
>> to specify them.****
>>
>> 3.      Does ShortestPathsBenchmark produce any output data inside HDFS
>> upon completion of this example, or is the example purely meant to visually
>> illustrate processing time on my cluster?****
>>
>> 4.      Can I feed ShortestPathsBenchmark my own graph?****
>>
>> 5.      In the example above, I have specified 3 edges per vertex.  If I
>> were to specify only 2 edges per vertex, am I not effectively dealing with
>> a graph that most closely resembles a “linked list”?  When I set –e=2, the
>> processing time is still somewhat comparable to –e = 3.  Shouldn’t the
>> graph be much simpler?   ****
>>
>>  ****
>>
>> I have seen the ShortestPathExample @ ****
>>
>> https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example
>> ****
>>
>>  ****
>>
>> and I was planning on working through that example as well, but I thought
>> I’d ask about the benchmarking example first.****
>>
>>  ****
>>
>> Thanks!****
>>
>>  ****
>>
>>  ****
>>
>> Bence ****
>>
>>  ****
>>
>> ** **
>>
>
>

Mime
View raw message