incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HAMA-359) Development of Shortest Path Finding Algorithm
Date Fri, 01 Apr 2011 09:44:05 GMT

    [ https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014469#comment-13014469
] 

Thomas Jungblut edited comment on HAMA-359 at 4/1/11 9:42 AM:
--------------------------------------------------------------

Thanks :)

Yesterday I setup the HBase and I decided to go a different way in the table layout.
The adjacency list will just contain the row numbers and the weights.
And you have a "lookup-table" that maps the vertex name, in my case the city name to the rownumber.

adjacency_list => rowId : rowId of the out-vertex and its weight
vertex_lookup => vertex : rowId

In the calculation of the shortest paths you don't need the name, so this won't blow up the
table too much.
But if you are really looking for the shortest way from London to Paris you have to translate
the city name to the rowid in order to start the shortest paths.
I guess this is a much better approach.

The first naive implementation of dijkstra would be:

You partition the row's to available grooms as boundaries (such as LIMIT's in a database,
e.G from row 0 to row 10k). I know that if you have an adjecency list, the distribution is
not perfectly balanced, I will provide a more advanced algorithm later.
Then you pass the grooms a message which tells them what is the current start-vertex.
Every groom is now searching for the lowest-weighted adjacent vertices and passes these messages
to a "master-groom" like in the PI Estimator.
After that you are syncing, the master will detect the globally clostest vertex and broadcasts
this as the new start-vertex to the grooms. This will update distances in an distance vector
D (source: the slides).
Of course you have to actually repartition the datasets, because you can strike-through the
row of the last start vertex, which will reduce the "height" of the list. 
Then this steps will go over and over again until you have reached the breaking condition.

Note that this is just a few thoughts, there are some questions left: 
How to store the already visited vertices?
How to store the direct ancestor of a vertex to reconstruct the shortest paths? Or in the
slides this is called the distance vector D.
Is this optimal in the case of a sync step for every vertex in the graph?
etc.

So I will have a few things left for the actual summer of code ;)
Thank you guys for your support.

      was (Author: thomas.jungblut):
    Thanks :)

Yesterday I setup the HBase and I decided to go a different way in the table layout.
The adjacency list will just contain the row numbers and the weights.
And you have a "lookup-table" that maps the vertex name, in my case the city name to the rownumber.

adjacency_list => rowId : rowId of the out-vertex and its weight
vertex_lookup => vertex : rowId

In the calculation of the shortest paths you don't need the name, so this won't blow up the
table too much.
But if you are really looking for the shortest way from London to Paris you have to translate
the city name to the rowid in order to start the shortest paths.
I guess this is a much better approach.

The first naive implementation of dijkstra would be:

You partition the row's to available grooms as boundaries (such as LIMIT's in a database,
e.G from row 0 to row 10k). I know that if you have an adjecency list, the distribution is
not perfectly balanced, I will provide a more advanced algorithm later.
Then you pass the grooms a message which tells them what is the current start-vertex.
Every groom is now searching for the lowest-weighted adjacent vertex and passes this message
to a "master-groom" like in the PI Estimator. 
After that you are syncing, the master will detect the globally clostest vertex and broadcasts
this as the new start-vertex to the grooms.
Of course you have to actually repartition the datasets, because you can strike-through the
row of the last start vertex, which will reduce the "height" of the list. 
Then this steps will go over and over again until you have reached the breaking condition.

Note that this is just a few thoughts, there are some questions left: 
How to store the already visited vertices?
How to store the direct ancestor of a vertex to reconstruct the shortest paths? Or in the
slides this is called the distance vector D.
Is this optimal in the case of a sync step for every vertex in the graph?
etc.

So I will have a few things left for the actual summer of code ;)
Thank you guys for your support.
  
> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a Shortest
Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message