[ https://issues.apache.org/jira/browse/HAMA359?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13014469#comment13014469
]
Thomas Jungblut edited comment on HAMA359 at 4/1/11 9:45 AM:

Thanks :)
Yesterday I setup the HBase and I decided to go a different way in the table layout.
The adjacency list will just contain the row numbers and the weights.
And you have a "lookuptable" that maps the vertex name, in my case the city name to the rownumber.
adjacency_list => rowId : rowId of the outvertex and its weight
vertex_lookup => vertex : rowId
In the calculation of the shortest paths you don't need the name, so this won't blow up the
table too much.
But if you are really looking for the shortest way from London to Paris you have to translate
the city name to the rowid in order to start the shortest paths.
I guess this is a much better approach.
The first naive implementation of dijkstra would be:
You partition the row's to available grooms as boundaries (such as LIMIT's in a database,
e.G from row 0 to row 10k). I know that if you have an adjecency list, the distribution is
not perfectly balanced, I will provide a more advanced algorithm later.
Then you pass the grooms a message which tells them what is the current startvertex.
Every groom is now searching for the lowestweighted adjacent vertices and passes these messages
to a "mastergroom" like in the PI Estimator.
After that you are syncing, the master will detect the globally clostest vertex and broadcasts
this as the new startvertex to the grooms. This will update distances in an distance vector
D (source: the slides).
Of course you have to actually repartition the datasets, because you can strikethrough the
row of the last start vertex, which will reduce the "height" of the list.
Then this steps will go over and over again until you have reached the breaking condition.
Note that this is just a few thoughts, there are some questions left:
How to store the already visited vertices? Or in the slides this is called the distance vector
D.
How to store the direct ancestor of a vertex to reconstruct the shortest paths?
Is this optimal in the case of a sync step for every vertex in the graph?
etc.
So I will have a few things left for the actual summer of code ;)
Thank you guys for your support.
was (Author: thomas.jungblut):
Thanks :)
Yesterday I setup the HBase and I decided to go a different way in the table layout.
The adjacency list will just contain the row numbers and the weights.
And you have a "lookuptable" that maps the vertex name, in my case the city name to the rownumber.
adjacency_list => rowId : rowId of the outvertex and its weight
vertex_lookup => vertex : rowId
In the calculation of the shortest paths you don't need the name, so this won't blow up the
table too much.
But if you are really looking for the shortest way from London to Paris you have to translate
the city name to the rowid in order to start the shortest paths.
I guess this is a much better approach.
The first naive implementation of dijkstra would be:
You partition the row's to available grooms as boundaries (such as LIMIT's in a database,
e.G from row 0 to row 10k). I know that if you have an adjecency list, the distribution is
not perfectly balanced, I will provide a more advanced algorithm later.
Then you pass the grooms a message which tells them what is the current startvertex.
Every groom is now searching for the lowestweighted adjacent vertices and passes these messages
to a "mastergroom" like in the PI Estimator.
After that you are syncing, the master will detect the globally clostest vertex and broadcasts
this as the new startvertex to the grooms. This will update distances in an distance vector
D (source: the slides).
Of course you have to actually repartition the datasets, because you can strikethrough the
row of the last start vertex, which will reduce the "height" of the list.
Then this steps will go over and over again until you have reached the breaking condition.
Note that this is just a few thoughts, there are some questions left:
How to store the already visited vertices?
How to store the direct ancestor of a vertex to reconstruct the shortest paths? Or in the
slides this is called the distance vector D.
Is this optimal in the case of a sync step for every vertex in the graph?
etc.
So I will have a few things left for the actual summer of code ;)
Thank you guys for your support.
> Development of Shortest Path Finding Algorithm
> 
>
> Key: HAMA359
> URL: https://issues.apache.org/jira/browse/HAMA359
> Project: Hama
> Issue Type: New Feature
> Components: examples
> Affects Versions: 0.2.0
> Reporter: Edward J. Yoon
> Labels: gsoc, gsoc2011, mentor
> Fix For: 0.3.0
>
> Original Estimate: 2016h
> Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a Shortest
Path using Hama BSP.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
