incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-359) Development of Shortest Path Finding Algorithm
Date Fri, 25 Mar 2011 10:29:07 GMT

    [ https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011153#comment-13011153
] 

Thomas Jungblut commented on HAMA-359:
--------------------------------------

Hehe, I should've looked at the age of the files :o)

The advantage of using HBase is the following:
HBase is built on top of HDFS, so it is controlling the distribution of the data.
BUT this could be a large disadvantage too, in the case if a groom is not running on a server
where the data is actually stored.
In MapReduce this is called a non-local task, so you have to copy the data to the local datanode.


Maybe Edward can tell us why he decided to get rid of HBase and MapReduce at all.

Using a GraphDB and an interface like Blueprints is like using a MySQL Database with JDBC
inside a distributed environment. It is possible, but IMHO it is not optimal.

My scenario would be:

- you have vertices represented as edges inside of a SequenceFile in this style: inVertex;outVertex;weight;
and so on... (quite similar to graphbase)
-- the user could then simply setup a MapReduce job that connects with Blueprints and write
it into the SequenceFile.
- we build an index on top of this, let this simply be a hashmap with a vertex and its byte
offset in the sequence file
Graphbase could make this work for us.

- then we focus on some basic Dijkstra, so the weights are positive, no matter if the graph
is directed or not.
-- we see how we can distribute this over many machines
-- my first approach would be a master server that holds a shortest-path-matrix and let the
slaves calculate the distance between the vertices and after that communicate with the master,
it simply sums them up and updates the matrix.
- for negative weights and very large datasets we could implement a A* heuristic

This is not really optimal, I know that. But this would be my first approach on this topic.

So what are your thoughts on that?

> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a Shortest
Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message