Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC5DCE0F6 for ; Tue, 29 Jan 2013 08:52:42 +0000 (UTC) Received: (qmail 47522 invoked by uid 500); 29 Jan 2013 08:52:42 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 47271 invoked by uid 500); 29 Jan 2013 08:52:41 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 47247 invoked by uid 99); 29 Jan 2013 08:52:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 08:52:41 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ogdude@googlemail.com designates 209.85.217.175 as permitted sender) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 08:52:34 +0000 Received: by mail-lb0-f175.google.com with SMTP id n3so397270lbo.6 for ; Tue, 29 Jan 2013 00:52:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=CmRjuPwq9TzJGGdew8zSKPHvvYUreg8ry1qx1oO6Bm4=; b=DRidds0/SO6HXUdpmPx6r/duvl4B5+ALjroHzfQ+T2JenZbEJKMtj/p0O0zJl15O45 sArLvhOHBE/R61SDY1M59XZQ4vxI/2ZDUrNqdxxNnhDQ+9lm1JeLnllGxJR4vcHRrtdr mC05uAwidsUDGJ4UFMb2jt/wIecXXn1wYrDlrtocbs+19jfYyeZvbbcUUX8Es2uSy23a J48DgzFYDlFWj4iWA3nH5+m36gjDAaMVApCEyJfgootivzOIzK55fKcRzR6DbJXXmBf6 FhpdGqsqKr1qreFAMopffodln18fezqF1ruPvq6cjkLi4hcYdTezGq6IKzwRFzIZf6Zf FgbQ== MIME-Version: 1.0 X-Received: by 10.152.123.13 with SMTP id lw13mr359794lab.28.1359449533733; Tue, 29 Jan 2013 00:52:13 -0800 (PST) Received: by 10.112.127.6 with HTTP; Tue, 29 Jan 2013 00:52:13 -0800 (PST) In-Reply-To: References: Date: Tue, 29 Jan 2013 09:52:13 +0100 Message-ID: Subject: Re: Multiple node types in Giraph and doing a selective M/R over one of them From: David Koch To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=f46d043bd8a08c5b2304d4698212 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043bd8a08c5b2304d4698212 Content-Type: text/plain; charset=ISO-8859-1 Hello Claudio and Eli, Thank you for your answers. As far as Map/Reduce being a better tool for the job - I was under the impression that Giraph relies on the M/R framework. It seems like it when I check the console output of the examples on the project's Wiki. Again, thank you. /David On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella < claudio.martella@gmail.com> wrote: > One more general point would be whether giraph is a better tool for your > problem. From my understanding, map reduce is probably the way to go. > > > On Monday, January 28, 2013, Eli Reisman wrote: > >> I agree, something like this is possible using the vertex value. In >> giraph, we now have native support for multigraphs, but before we had that >> support, I described a kind of "cheat" to process multigraphs. You could >> use a variation of that same cheat (its on the site confluence wiki) to do >> what you're talking about I think, even though you're not dealing with a >> multigraph in the problem you described. Essentially, you can get clever >> about what sort of Writable you use for the vertex value type, and/or what >> the values it holds can represent in your dataset. >> >> Alternately, in the off chance that the row-keys do not repeat in the >> tables, then really the "row key" can be a Writable vertex ID as long as >> each is unique .The only repetition would be the fact that other rows with >> their own unique row-keys contain row values that mark out-edges to other >> unique row-keys in the table, but more than once since any row-key could >> have lots of other rows "pointing" an out-edge value towards it. Thinking >> of each row key as unique vertex ID then just turns this into a vanilla >> graph. However, if the row keys are not unique in among all your tables, >> this oversimplifies the problem and you really are stuck wtih the above >> vertex value option. >> >> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's, and >> message data types, and as long as the properties required of each for a >> graph are met, and each is a Writable, you can think of the problem (often) >> in one of several ways that Giraph can support. >> >> One last thought: assuming the graph does not mutate during processing, >> you could also write a custom input format that evaluates each row as it >> builds it into a graph vertex data structure, and chooses only row keys >> that are of a certain classification in your use case to make into graph >> data for that job run, simply skipping the other rows as it reads them. >> again, this "solution" depends on the nature of your problem. Just >> something to play with. >> >> Good luck with your use case! >> >> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella < >> claudio.martella@gmail.com> wrote: >> >>> Giraph does not support multipartite graph in a natural way. But you can >>> try to model your different sets through the vertexvalue. You can then >>> propagate it (by composing with the ID?) to the neighbors, and obtain your >>> join. >>> >>> >>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch wrote: >>> >>>> Hello, >>>> >>>> In Giraph is it possible to have different node types in a graph and >>>> have a Map/Reduce only iterate over nodes of this type and their direct >>>> successors? >>>> >>>> If it sounds a bit cryptic here is something more about our use-case: >>>> We have different HBase tables which we want to "pseudo-join" in >>>> Map/Reduce computations. The node types I mentioned above correspond to the >>>> respective row-key types used in each of those tables, edges are generated >>>> by the fact that the KeyValues in each table can contain row-key values >>>> found in one of the other tables. >>>> >>>> The graph would describe these relations. In a Map/Reduce I then want >>>> to be able to iterate over all nodes of a given type while also having >>>> access to a node's successor nodes in the same Mapper instance or better >>>> yet the same map() call. One would then carry out additional Gets to >>>> retrieve the data from the tables thus doing a fairly crude join. >>>> >>>> The Graph is likely to change so it would be nice if it could be >>>> updated incrementally. >>>> >>>> Does all this sound like something that would be possible with Giraph? >>>> >>>> Thank you, >>>> >>>> /David >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Claudio Martella >>> claudio.martella@gmail.com >>> >> >> > > -- > Claudio Martella > claudio.martella@gmail.com > --f46d043bd8a08c5b2304d4698212 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Claudio and Eli,

Thank you for your answers. As fa= r as Map/Reduce being a better tool for the job - I was under the impressio= n that Giraph relies on the M/R framework. It seems like it when I check th= e console output of the examples on the project's Wiki.

Again, thank you.

/David
On Mon, Jan 28, 2013 at 8:49 PM, Claudio Marte= lla <claudio.martella@gmail.com> wrote:
One more general point would be whether gira= ph is a better tool for your problem. From my understanding, map reduce is = probably the way to go.=A0


On Monday, January 28, 2013, Eli Rei= sman wrote:
I agree, something like this is possible using the vertex value. In giraph,= we now have native support for multigraphs, but before we had that support= , I described a kind of "cheat" to process multigraphs. You could= use a variation of that same cheat (its on the site confluence wiki) to do= what you're talking about I think, even though you're not dealing = with a multigraph in the problem you described. Essentially, you can get cl= ever about what sort of Writable you use for the vertex value type, and/or = what the values it holds can represent in your dataset.

Alternately, in the off chance that the row-keys do not repeat in the t= ables, then really the "row key" can be a Writable vertex ID as l= ong as each is unique .The only repetition would be the fact that other row= s with their own unique row-keys contain row values that mark out-edges to = other unique row-keys in the table, but more than once since any row-key co= uld have lots of other rows "pointing" an out-edge value towards = it. Thinking of each row key as unique vertex ID then just turns this into = a vanilla graph. However, if the row keys are not unique in among all your = tables, this oversimplifies the problem and you really are stuck wtih the a= bove vertex value option.

My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID'= s, and message data types, and as long as the properties required of each f= or a graph are met, and each is a Writable, you can think of the problem (o= ften) in one of several ways that Giraph can support.

One last thought: assuming the graph does not mutate during processing,= you could also write a custom input format that evaluates each row as it b= uilds it into a graph vertex data structure, and chooses only row keys that= are of a certain classification in your use case to make into graph data f= or that job run, simply skipping the other rows as it reads them. again, th= is "solution" depends on the nature of your problem. Just somethi= ng to play with.

Good luck with your use case!

On Mon,= Jan 28, 2013 at 7:09 AM, Claudio Martella <claudio= .martella@gmail.com> wrote:
Giraph does not support mul= tipartite graph in a natural way. But you can try to model your different s= ets through the vertexvalue. You can then propagate it (by composing with t= he ID?) to the neighbors, and obtain your join.


On = Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogdude@go= oglemail.com> wrote:
Hello,

In Giraph is it po= ssible to have different node types in a graph and have a Map/Reduce only i= terate over nodes of this type and their direct successors?

If it sounds a bit cryptic here is something more about= our use-case:
We have different HBase tables which we want to &q= uot;pseudo-join" in Map/Reduce computations. The node types I mentione= d above correspond to the respective row-key types used in each of those ta= bles, edges are generated by the fact that the KeyValues in each table can = contain row-key values found in one of the other tables.

The graph would describe these relations. In a Map/Redu= ce I then want to be able to iterate over all nodes of a given type while a= lso having access to a node's successor nodes in the same Mapper instan= ce or better yet the same map() call. One would then carry out additional G= ets to retrieve the data from the tables thus doing a fairly crude join.

The Graph is likely to change so it would be nice if it= could be updated incrementally.

Does all this sou= nd like something that would be possible with Giraph?

Thank you,

/David





<= font color=3D"#888888">--
=A0 =A0Claudio Martella
=A0 =A0claudi= o.martella@gmail.com=A0 =A0



--
=A0 =A0Claudio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0<= br>

--f46d043bd8a08c5b2304d4698212--