Return-Path: X-Original-To: apmail-incubator-hama-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 914E59228 for ; Wed, 18 Apr 2012 02:24:46 +0000 (UTC) Received: (qmail 71844 invoked by uid 500); 18 Apr 2012 02:24:46 -0000 Delivered-To: apmail-incubator-hama-commits-archive@incubator.apache.org Received: (qmail 71821 invoked by uid 500); 18 Apr 2012 02:24:46 -0000 Mailing-List: contact hama-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-dev@incubator.apache.org Delivered-To: mailing list hama-commits@incubator.apache.org Received: (qmail 71812 invoked by uid 99); 18 Apr 2012 02:24:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2012 02:24:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2012 02:24:43 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 3CFB96EA; Wed, 18 Apr 2012 02:24:22 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 18 Apr 2012 02:24:21 -0000 Message-ID: <20120418022421.10193.20468@eos.apache.org> Subject: =?utf-8?q?=5BHama_Wiki=5D_Update_of_=22GraphPackage=22_by_edwardyoon?= Auto-Submitted: auto-generated Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for chan= ge notification. The "GraphPackage" page has been changed by edwardyoon: http://wiki.apache.org/hama/GraphPackage?action=3Ddiff&rev1=3D6&rev2=3D7 + Please update http://people.apache.org/~edwardyoon/site/hama_graph_tutori= al.html - =3D The Graph Package (Angrapa) =3D - The graph package, called Angrapa, is an large-scale graph data managemen= t framework for analytical processing. It is still in heavy development. An= grapa will employ massive parallelism on Hadoop, and It aims to achieve the= scalability for processing tera bytes or peta bytes graph data. Angrapa wi= ll be used in a variety of scientific and industrial areas, such as data mi= ning, machine learning, information retrieval, bioinformatics, and social n= etworks, required to process large-scale graph data. = - =3D The Main Goal =3D - * Easy APIs familiar to graph features - * Storing techniques and the data communication method (i.e., BSP) witho= ut deterioration of graph data locality - = - =3D An Overview of the Angrapa =3D - The architecture of angrapa is similar to that of !MapReduce except it is= founded on the [[BulkSynchronizationParallel| BSP]]. It consists of one ma= ster and many walkers. One master corresponds to a jobtracker of !MapReduce= , and many walkers correspond to task trackers. Like !MapReduce, angrapa wi= ll be carried out on data sets on HDFS or HBase. - = - For processing, programs written in angrapa will be automatically paralle= lized and executed on walkers. The processing of angrapa consists of a sequ= ence of parallel supersteps. Each superstep is also subdivided into three o= rdered phases, consisting of local computation in each walker; communicatio= ns among the walkers, leading to transfers of intermediate data between wal= kers; and a barrier synchronization, waiting for all of the communications = to complete. - = - In the local computation phase, each walker polls a vertex ''v1'' - initi= ally, it is given by a user or the program, or it can be obtained from inte= rmediate results transmitted from other walkers after the first superstep. = - from a local queue, and it starts to traverse from ''v1'' on graph data t= hat reside in a local storage. During this phase, walker enqueues additiona= l vertices to be processed into the local queue. However, if some inserted = vertex ''v2'' do not reside in the local storage, the vertex ''v2'' is remo= ved from the local queue and inserted to the global queue that keeps vertic= es to be transmitted into other walkers at the next communication phase. - = - In the communication phase, each walker communicates intermediate data ab= out vertices that reside in local but are necessary to be computed with oth= er vertices that reside in another walkers. - = - In the barrier synchronization phase, the master controls all of the walk= ers with zookeepers. If there are no intermediate data to be processed, the= program will finish, and it is the end of one superstep. Otherwise, next s= uperstep will start with intermediate data transmitted among walkers. - = - =3D Example 1 =3D - We assume that given a large-scale graph data and a start vertex, we find= all shortest paths from the start vertex to every vertices on the given gr= aph data. In such a case, the program written in angrapa starts from the st= art vertex. During the local computation phase, each time walker dequeues a= vertex from the local queue, each walker enqueues its adjacent vertices wi= th paths from the start vertex to the current dequeued vertex. Then, an enq= ueued vertex is checked if it resides in the local partition. If not, it is= picked to the global queue. (To be described in more detail) - = - =3D Example 2 =3D - For example, we assume that given a subgraph ''s'', we find a set of subg= raphs isomorphic to the given subgraph ''s''. In such a case, angrapa will = be very desirable because each walker can independently begin with some sta= rt vertex matched to the subgraph ''s''. The program will needs only one su= perstep if the subgraph ''s'' is small enough to be within a partition. (To= be described in more detail) -=20