hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hamburg" by HyunsikChoi
Date Sat, 25 Jul 2009 11:26:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by HyunsikChoi:
http://wiki.apache.org/hadoop/Hamburg

The comment on the change is:
Improved the motivation

------------------------------------------------------------------------------
  ## page was renamed from Hambrug
  
  == Motivation ==
- The MapReduce (M/R) programming model is inappropriate to problems based on data where each
portion depends on many other potions and their relations are very complicated. It is because
these problems cause as follows:
+ The MapReduce (M/R) programming model is inappropriate to graph problems because of the
following reasons:
  
   ''Do you know other situations that might fall into what you are describing above?''
  
+  * '''!MapReduce does not support traversing graph''' – A mapper reads input data sequentially,
and it can’t control its input data. In contrast, most of the graph problems are based on
walking vertices step by step. Walking vertices is to expand adjacent vertices from a given
vertex.  This operation is only available if current input data can be determined by the previous
operation. In MapReduce, however, the previous operation cannot affect the input data of the
next operation. Traversing a vertex is the most basic premitive operation in graph operations.
Consequently, graph processing with MapReduce is very limited. In order to come over this
limit, we have to avoid traverse of graph in order to solve graph problems ([http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=5076317
Graph Twiddling in a MapReduce World]) or have to use many M/R iterations each time walk vertices
([http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html Breadth-First Search
(B
 FS) & MapReduce]).
+  * '''!MapReduce limits to assigning one reducer''' - In a MapReduce problem on graph data,
assigning appropriate reducers according to their relations of partitioned graphs is very
hard. Assigning only one reducer is a straightforward way to solve their complex relations,
but it is apparent to cause deterioration of scalability.
+  * '''More complicated M/R program''' - To avoid graph traverse or the limit of one reduce,
the M/R programs would be inevitablely complicated with code to communicate data among data
nodes.
-  * limit to assigning one reducer
-   * In case that the relations of data are very complex, assigning intermediate data to
appropriate reducers by considering their dependency of partitioned graphs may be very hard.
Assigning only one reducer is a straightway to solve complexity dependency, but it is apparent
to cause deterioration of scalability.
-  * many M/R iterations
-  * or make an M/R program more complicated
-   * To avoid above two inefficient methods, the M/R program will be complicated with code
to communicate data among data nodes.
  
- These problems are very common in many areas; especially, many graph problems are exemplary.
Therefore, we try to propose a new programming model, named Hamburg. The main objective of
Hamburg is to support well the problems based on data having complexity dependency one another.
This page is an initial work of our proposal.
+ Therefore, we need a new programming model for graph processing on Hadoop.
  
   ''We should survey other areas -- Edward J.''
  
  == Goal ==
+  * Support graph traverse
+  * Support a simple programming interface dealing graph data
-  * Follow scalability concept of shared-nothing architecture
+  * Follow the scalability concept of shared-nothing architecture
-  * Support a simple programming model to compute complex relations such as, graph data.
  
  == Hamburg ==
  Hambrug is an alternative to M/R programming model. It is based on bulk synchronization
parallel (BSP) model. Like M/R, Hambrug takes advantages from shared-nothing architecture
(SN), so I expect that it will also show scalablity without almost degradation of performance
as the number of participant nodes increases.

Mime
View raw message