incubator-hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Hama Wiki] Update of "RoadMap" by HyunsikChoi
Date Fri, 19 Mar 2010 11:49:17 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "RoadMap" page has been changed by HyunsikChoi.
The comment on this change is: Added the reason why we have to adopt BSP to hama..


  = Long-term Issues =
  We have a plan to redesign Hama to be based on BSP model and be specified to shared nothing
systems consisting of several thousands commodity servers, which is generally called cloud
computing environments.
  == Why BSP? ==
- (Working)
+ In respect of graph package, BSP is also necessary for Hama to process graph data efficiently
in shared-nothing architectures. The essence of graph data is connectivities between vertices.
During processing, Hama will need not only some vertex's data but also its adjacent vertices'
data. Assume that we have a graph data set that partitioned to some cohesive subgraphs. That
is, the adjacent vertices can be saved in the same physical storage or near storage as possible.
Although we have well-partitioned graphs, MapReduce doesn't exploit its characteristic since
it reads input data sequentially and it can’t control its input data. In addition, its partitioner
hashes the input data. However, BSP mode can enable graph processing to be performed efficiently
while preserving the locality of graph data.
  === Design Considerations ===
   * Fault Tolerance - Hama aims at running on a several thousands of commodity servers, so
it is subject to some fault. In addition, Hama is for large-scale processing that generally
takes long time ranging from few minutes to several hours. Therefore, it is important for
Hama to finish some given jobs although faults occur during processing. If not, Hama has to
restart all jobs.
   * Heterogeneity - 

View raw message