Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3896310336 for ; Thu, 2 May 2013 06:09:52 +0000 (UTC) Received: (qmail 4837 invoked by uid 500); 2 May 2013 06:09:51 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 4527 invoked by uid 500); 2 May 2013 06:09:47 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 4496 invoked by uid 99); 2 May 2013 06:09:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 May 2013 06:09:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of claudio.martella@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vb0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 May 2013 06:09:41 +0000 Received: by mail-vb0-f42.google.com with SMTP id w16so188751vbf.1 for ; Wed, 01 May 2013 23:09:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=blTb6oj0mIPVJLUd/+S17+WL5MqxN33JiKM9KUvhHHI=; b=IojL0iqcZhJ+nfSOFrfL5R2i+GWelDhauRUmyM2Y8rHT39eA5gALf69nfe/ZyPbtl0 +Wq0evrn4KQ9abLh3HccdSG1aLQfyXOMAl+ZUwlM6j3mVYIIikFQQnGWm/cRnQz7HhCl NoEaMFWQQCne7c3GdnPDVAzZ75AvHBQJsgUGH+9bYI2h6vxQGvciJAXJCPZrIilZJBvH EmlXY4XqTCp30k7PPp4Kxqu4qrmAoU1mNFNH2OxYXlZKuG8KVFYy9T32/EuDepAfv+Cb y4oaDdsSBvvTBMbTExZR7nJEGk8DCu195ebkAVj4aEAz8UFFd9BspyjgxzWWiMWnOZ07 tBVQ== X-Received: by 10.221.9.70 with SMTP id ov6mr1705178vcb.72.1367474960748; Wed, 01 May 2013 23:09:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.115.132 with HTTP; Wed, 1 May 2013 23:09:00 -0700 (PDT) In-Reply-To: References: From: Claudio Martella Date: Thu, 2 May 2013 08:09:00 +0200 Message-ID: Subject: Re: Can I use Giraph on an application with two maps but no reduce? To: "user@giraph.apache.org" Content-Type: multipart/alternative; boundary=089e011770f9468cac04dbb6132c X-Virus-Checked: Checked by ClamAV on apache.org --089e011770f9468cac04dbb6132c Content-Type: text/plain; charset=ISO-8859-1 The question is: do you have 100GB of main-memory? How big are your messages going to be? How dense is the graph? Although we have out-of-core facilities, it looks to me not like a typical graph algorithm, and in particular not one that would particularly take advantage of Giraph compared to MapReduce. This is because it has a low number of iterations (two), and hence, in particular if you have memory constraints, it could work out pretty easily with MapReduce. Also, it looks to me like a map/reduce job, there the reducer could do the second iterations, but I could miss some details. As far as load-balancing is concerned, i guess it depends on your degree distribution. Having a "random" distribution of vertices through hash-partitioning should back you up, but if you have a bunch of nodes that are much more active, you could have some stranglers. On Thu, May 2, 2013 at 2:12 AM, Hadoop Explorer wrote: > I have an application that evaluate a graph using this algorithm: > > - use a parallel for loop to evaluate all nodes in a graph (to evaluate a > node, an image is read, and then result of this node is calculated) > > - use a second parallel for loop to evaluate all edges in the graph. The > function would take in results from both nodes of the edge, and then > calculate the answer for the edge > > The final result will consist of calculated results of each edge. So each > node, and each edge is essentially a job, and in this case, an edge is more > like a job than a message > > As you can see, the above algorithm would employ two map functions, but no > reduce function. The total data size can be very large (say 100GB). Also, > the workload of each node and each edge is highly irregular, and thus load > balancing mechanisms are essential. > > In this case, will giraph suit this application? if so, how will my > program like? And will giraph be able to strike the balance between a good > load balancing of the second map function, and minimizing data transfer of > the results from the first map function? > > > -- Claudio Martella claudio.martella@gmail.com --089e011770f9468cac04dbb6132c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The question is: do you have 100GB of main-memory? How big= are your messages going to be? How dense is the graph?
Although = we have out-of-core facilities, it looks to me not like a typical graph alg= orithm, and in particular not one that would particularly take advantage of= Giraph compared to MapReduce. This is because it has a low number of itera= tions (two), and hence, in particular if you have memory constraints, it co= uld work out pretty easily with MapReduce. Also, it looks to me like a map/= reduce job, there the reducer could do the second iterations, but I could m= iss some details. As far as load-balancing is concerned, i guess it depends= on your degree distribution. Having a "random" distribution of v= ertices through hash-partitioning should back you up, but if you have a bun= ch of nodes that are much more active, you could have some stranglers.


On Thu,= May 2, 2013 at 2:12 AM, Hadoop Explorer <hadoopexplorer@outlook.= com> wrote:
I have an application that evaluate a graph using thi= s algorithm:

- use a parallel for loop to evaluate all nodes in a graph (to evaluate a node, an image is read, and then result of this node is calculated)
- use a second parallel for loop to evaluate all edges in the graph.=A0 The function would take in results from both nodes of the edge, and then=20 calculate the answer for the edge

The final result will consist of c= alculated results of each edge.=A0 So each node, and each edge is essential= ly a job, and in this case, an edge is more like a job than a message

As you can see, the above=20 algorithm would employ two map functions, but no reduce function.=A0 The=20 total data size can be very large (say 100GB).=A0 Also, the workload of=20 each node and each edge is highly irregular, and thus load balancing=20 mechanisms are essential.

In this case, will giraph suit this=20 application?=A0 if so, how will my program like?=A0 And=20 will giraph be able to strike the balance between a good load balancing=20 of the second map function, and minimizing data transfer of the results=20 from the first map function?





--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0
--089e011770f9468cac04dbb6132c--