hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "DynamicGraphs" by AnastasisAndronidis
Date Sun, 16 Jun 2013 22:00:32 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "DynamicGraphs" page has been changed by AnastasisAndronidis:

New page:
== Dynamic Addition and Removal of Vertexes in Graph API ==

=== Introduction ===
Nowadays more and more people turn to distributed environments to store and analyze Big Data.
For that reason there is a growing need in both achieving efficiency and new features. Hama
is an upcoming project that gives to researchers and analysts a way to handle big amounts
of data, through the BSP computation model.
A very interesting feature of the Hama project is the Graph API for graph analysis. A lot
of scientists and companies represent and manage their data with the use of one or more of
the many different kinds of graphs (e.g. incremental graphs).
This article will present some new features on the Graph API of Hama, on how to create a dynamic
graphs on run time. It also will serve as a technical document of the internals of those features.

=== Description ===
At the time being, we will discuss only two different operations (addition and deletion) from
both points of the end-user and internal architecture.

Some points that we need to keep in mind, are:
 1. Deletion/Addition is happening after a super step. ''We are providing methods inside a
vertex instance that create/delete a vertex''
 2. New vertexes needs to be distributed through partitioner to be placed on right peers.
 3. Keep in mind that a new created vertex and old vertexes, will have the same super step
counter. ''Various algorithms change their behavior though time, an example with an internal
counter to serve as a state is needed''

=== Implementation Details ===
The `GraphJobRunner` class, contains the major steps for the graph computation. It starts
with an initial setup, where the vertexes are loaded to memory, then there is the main computation
loop, where the supersteps occur, and in the end there is the cleanup, where of the graph
and the results are written to HDFS.

Our main target (concerning the operations +/-) is to enable the functionality of `GraphJobRunner::loadVertices()`
function during a superstep so the user will be able to call a function during a superstep
(e.g. `addVertex(value, List<> edges)`) to add a vertex. The user will also be able
to call a delete method (e.g. `deleteVertex(vertexID)`) to remove vertexes.

An important note here is that adding or removing vertexes is happening in every sync of the
peers. This means that during a superstep the additions or deletions will be locally stored
and after the end of the superstep will be committed, through messages, to the peers through
the partitioner. 

View raw message