incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Erdelyi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-409) Add Pregel-like API
Date Tue, 12 Jul 2011 06:18:01 GMT

    [ https://issues.apache.org/jira/browse/HAMA-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063739#comment-13063739
] 

Miklos Erdelyi commented on HAMA-409:
-------------------------------------


To get HAMA graph running from the attached archive please follow the below steps:
1) Unzip the archive in the root of a freshly checked out HAMA trunk.

2) Get fastutil (http://fastutil.dsi.unimi.it/) and put its jar under ${basedir}/lib. I needed
to put two other dependencies there because I could not find them in a public maven repository
(dsiutils and webgraph).

3) To run the example inlink counter hama-graph job, invoke:
java org.apache.hama.graph.GraphComputeRunner <path-to-partitioned-graph>/ <number-of-partitions>
org.apache.hama.graph.examples.InlinkCountComputer <max-iterations>

All the dependencies need to be on the classpath including the used HAMA distribution's config
directory. The partitioned graph needs to be accessible through NFS on all the cluster nodes.

A few thoughts on current implementation:

a) Graph input
Currently HipG's (www.cs.vu.nl/~ekr/hipg/) segmented graph loader is used for loading partitioned
graphs. A conversion util is provided (org.apache.hama.graph.format.ConvertFromWebGraphASCII.java)
which can convert from WebGraph's (http://webgraph.dsi.unimi.it/) ASCII format into the segmented
graph format.

This mechanism should be replaced with a builder-like approach employed by e.g. GoldenOrb:
based on some input specification a factory class dependent on the input format of the graph
should load vertexes into memory.
A good approach would be letting workers load data-local segments of the graph and do a partition-worker
assignment based on data-locality, after which loaded vertexes not belonging to one worker's
partition(s) could be sent to the proper workers via network.

b) Message buffering
In the current implementation messages are grouped by vertexes after a new superstep starts,
i.e., VertexComputerGeneric iterates through all messages recevied by the BSPPeer and puts
them into the queue of respective vertexes.
This memory copying could be avoided by letting the BSP framework do the grouping while receiving
the messages, i.e., by grouping messages by tag such that messages intended for the same vertex
ID will have the same tag.

c) Output
Currently nothing is output from the vertexes. However, it would be relatively easy to add,
e.g., by requiring that the Vertex class implement Writable and serializing all vertexes at
the end of certain supersteps. This would be a step towards fault-tolerance too.


> Add Pregel-like API
> -------------------
>
>                 Key: HAMA-409
>                 URL: https://issues.apache.org/jira/browse/HAMA-409
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>              Labels: graph
>             Fix For: 0.4.0
>
>         Attachments: hama-graph.zip
>
>
> According to what we've discussed on the mailing list, we should add a Pregel-like API.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message