hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alois Cochard (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HAMA-359) Development of Shortest Path Finding Algorithm
Date Fri, 25 Mar 2011 12:49:05 GMT

    [ https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011181#comment-13011181
] 

Alois Cochard edited comment on HAMA-359 at 3/25/11 12:47 PM:
--------------------------------------------------------------

>>BUT this could be a large disadvantage too, in the case if a groom is not running
on a server where the data is actually stored.
>> In MapReduce this is called a non-local task, so you have to copy the data to the
local datanode

Exactly what I was thinking when I was speaking about problem of data locality.

>> Using a GraphDB and an interface like Blueprints is like using a MySQL Database with
JDBC inside a distributed environment. It is possible, but IMHO it is not optimal.

+1 Same conclusion here. I would say it's even an horrible abstraction inversion.

>>>>I looked at the graph-hbase, but it's just a graph-friendly API layer on top
of HBase. It'll make you feel complex.

So ok, no added value at all. Will break the flexibility of storing the adjacency matrix the
way you want.

To concluded it's seems more important to know *which* data structure to use, more than *where*
to store it.

When you sure which structure to use (adjacency matrix i.e.) you can then choose the best
system to store/access it (SequenceFile/HBase/...) and change it if necessary without impacting
the algorithm.

Thanks !

      was (Author: alois.cochard):
    >>BUT this could be a large disadvantage too, in the case if a groom is not running
on a server where the data is actually stored.
>> In MapReduce this is called a non-local task, so you have to copy the data to the
local datanode

Exactly what I was thinking when I was speaking about problem of data locality.

>> Using a GraphDB and an interface like Blueprints is like using a MySQL Database with
JDBC inside a distributed environment. It is possible, but IMHO it is not optimal.

+1 Same conclusion here. I would say it's even an horrible abstraction inversion.

>>>>I looked at the graph-hbase, but it's just a graph-friendly API layer on top
of HBase. It'll make you feel complex.

So ok, no added value at all. Will break the flexibility of storing the adjacency matrix the
way you want.

To concluded it's seems more important to know *which* data structure to use, more than *where*
to store it.

When you sure which structure to use (adjacency list i.e.) you can then choose the best system
to store/access it (SequenceFile/HBase/...) and change it if necessary without impacting the
algorithm.

Thanks !
  
> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a Shortest
Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message