Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Received-SPF: pass (nike.apache.org: domain of ogdude@googlemail.com
 designates 209.85.217.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFJOoJeQ+iumA2j8+-O5EV8bkbaNcRxfbs-esB_xButap363pQ@mail.gmail.com>
References: 
 <CAE24rAe=OjtmzahZnTASK=4cJa=fBxv+GLuq6cUp8Jn_xSU-RA@mail.gmail.com>
	<CAFJOoJfSYO2cpNMhRiiKAndcrPqRsnV+3Tr8pTATWPSY0gYeAA@mail.gmail.com>
	<CAOsFL5QumvoFkx9Jf0Xisu5UFnP-FA_3RHh6oEZu2qGbQUsyLA@mail.gmail.com>
	<CAFJOoJeQ+iumA2j8+-O5EV8bkbaNcRxfbs-esB_xButap363pQ@mail.gmail.com>
Date: Tue, 29 Jan 2013 09:52:13 +0100
Message-ID: 
 <CAE24rAcFsbgfHWhs39-Vx9QXfnqN9Wiq4FGciNuK8TXWr5V_ng@mail.gmail.com>
Subject: Re: Multiple node types in Giraph and doing a selective M/R over one
 of them
From: David Koch <ogdude@googlemail.com>
To: user@giraph.apache.org
Content-Type: multipart/alternative; boundary=f46d043bd8a08c5b2304d4698212

--f46d043bd8a08c5b2304d4698212
Content-Type: text/plain; charset=ISO-8859-1

Hello Claudio and Eli,

Thank you for your answers. As far as Map/Reduce being a better tool for
the job - I was under the impression that Giraph relies on the M/R
framework. It seems like it when I check the console output of the examples
on the project's Wiki.

Again, thank you.

/David

On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> One more general point would be whether giraph is a better tool for your
> problem. From my understanding, map reduce is probably the way to go.
>
>
> On Monday, January 28, 2013, Eli Reisman wrote:
>
>> I agree, something like this is possible using the vertex value. In
>> giraph, we now have native support for multigraphs, but before we had that
>> support, I described a kind of "cheat" to process multigraphs. You could
>> use a variation of that same cheat (its on the site confluence wiki) to do
>> what you're talking about I think, even though you're not dealing with a
>> multigraph in the problem you described. Essentially, you can get clever
>> about what sort of Writable you use for the vertex value type, and/or what
>> the values it holds can represent in your dataset.
>>
>> Alternately, in the off chance that the row-keys do not repeat in the
>> tables, then really the "row key" can be a Writable vertex ID as long as
>> each is unique .The only repetition would be the fact that other rows with
>> their own unique row-keys contain row values that mark out-edges to other
>> unique row-keys in the table, but more than once since any row-key could
>> have lots of other rows "pointing" an out-edge value towards it. Thinking
>> of each row key as unique vertex ID then just turns this into a vanilla
>> graph. However, if the row keys are not unique in among all your tables,
>> this oversimplifies the problem and you really are stuck wtih the above
>> vertex value option.
>>
>> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's, and
>> message data types, and as long as the properties required of each for a
>> graph are met, and each is a Writable, you can think of the problem (often)
>> in one of several ways that Giraph can support.
>>
>> One last thought: assuming the graph does not mutate during processing,
>> you could also write a custom input format that evaluates each row as it
>> builds it into a graph vertex data structure, and chooses only row keys
>> that are of a certain classification in your use case to make into graph
>> data for that job run, simply skipping the other rows as it reads them.
>> again, this "solution" depends on the nature of your problem. Just
>> something to play with.
>>
>> Good luck with your use case!
>>
>> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella <
>> claudio.martella@gmail.com> wrote:
>>
>>> Giraph does not support multipartite graph in a natural way. But you can
>>> try to model your different sets through the vertexvalue. You can then
>>> propagate it (by composing with the ID?) to the neighbors, and obtain your
>>> join.
>>>
>>>
>>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogdude@googlemail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> In Giraph is it possible to have different node types in a graph and
>>>> have a Map/Reduce only iterate over nodes of this type and their direct
>>>> successors?
>>>>
>>>> If it sounds a bit cryptic here is something more about our use-case:
>>>> We have different HBase tables which we want to "pseudo-join" in
>>>> Map/Reduce computations. The node types I mentioned above correspond to the
>>>> respective row-key types used in each of those tables, edges are generated
>>>> by the fact that the KeyValues in each table can contain row-key values
>>>> found in one of the other tables.
>>>>
>>>> The graph would describe these relations. In a Map/Reduce I then want
>>>> to be able to iterate over all nodes of a given type while also having
>>>> access to a node's successor nodes in the same Mapper instance or better
>>>> yet the same map() call. One would then carry out additional Gets to
>>>> retrieve the data from the tables thus doing a fairly crude join.
>>>>
>>>> The Graph is likely to change so it would be nice if it could be
>>>> updated incrementally.
>>>>
>>>> Does all this sound like something that would be possible with Giraph?
>>>>
>>>> Thank you,
>>>>
>>>> /David
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>>>
>>
>>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

--f46d043bd8a08c5b2304d4698212
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hello Claudio and Eli,<div><br></div><div>Thank you for your answers. As fa=
r as Map/Reduce being a better tool for the job - I was under the impressio=
n that Giraph relies on the M/R framework. It seems like it when I check th=
e console output of the examples on the project&#39;s Wiki.</div>
<div><br></div><div>Again, thank you.</div><div><br></div><div>/David<br><b=
r><div class=3D"gmail_quote">On Mon, Jan 28, 2013 at 8:49 PM, Claudio Marte=
lla <span dir=3D"ltr">&lt;<a href=3D"mailto:claudio.martella@gmail.com" tar=
get=3D"_blank">claudio.martella@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">One more general point would be whether gira=
ph is a better tool for your problem. From my understanding, map reduce is =
probably the way to go.=A0<div class=3D"HOEnZb">
<div class=3D"h5"><span></span><br><br>On Monday, January 28, 2013, Eli Rei=
sman  wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex">
I agree, something like this is possible using the vertex value. In giraph,=
 we now have native support for multigraphs, but before we had that support=
, I described a kind of &quot;cheat&quot; to process multigraphs. You could=
 use a variation of that same cheat (its on the site confluence wiki) to do=
 what you&#39;re talking about I think, even though you&#39;re not dealing =
with a multigraph in the problem you described. Essentially, you can get cl=
ever about what sort of Writable you use for the vertex value type, and/or =
what the values it holds can represent in your dataset.<br>


<br>Alternately, in the off chance that the row-keys do not repeat in the t=
ables, then really the &quot;row key&quot; can be a Writable vertex ID as l=
ong as each is unique .The only repetition would be the fact that other row=
s with their own unique row-keys contain row values that mark out-edges to =
other unique row-keys in the table, but more than once since any row-key co=
uld have lots of other rows &quot;pointing&quot; an out-edge value towards =
it. Thinking of each row key as unique vertex ID then just turns this into =
a vanilla graph. However, if the row keys are not unique in among all your =
tables, this oversimplifies the problem and you really are stuck wtih the a=
bove vertex value option.<br>


<br>My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID&#39;=
s, and message data types, and as long as the properties required of each f=
or a graph are met, and each is a Writable, you can think of the problem (o=
ften) in one of several ways that Giraph can support.<br>


<br>One last thought: assuming the graph does not mutate during processing,=
 you could also write a custom input format that evaluates each row as it b=
uilds it into a graph vertex data structure, and chooses only row keys that=
 are of a certain classification in your use case to make into graph data f=
or that job run, simply skipping the other rows as it reads them. again, th=
is &quot;solution&quot; depends on the nature of your problem. Just somethi=
ng to play with.<br>


<br>Good luck with your use case!<br><br><div class=3D"gmail_quote">On Mon,=
 Jan 28, 2013 at 7:09 AM, Claudio Martella <span dir=3D"ltr">&lt;<a>claudio=
.martella@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Giraph does not support mul=
tipartite graph in a natural way. But you can try to model your different s=
ets through the vertexvalue. You can then propagate it (by composing with t=
he ID?) to the neighbors, and obtain your join.</div>


<div class=3D"gmail_extra"><div><div><br><br><div class=3D"gmail_quote">On =
Mon, Jan 28, 2013 at 2:52 PM, David Koch <span dir=3D"ltr">&lt;<a>ogdude@go=
oglemail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hello,<div><br></div><div>In Giraph is it po=
ssible to have different node types in a graph and have a Map/Reduce only i=
terate over nodes of this type and their direct successors?</div>


<div><br></div><div>If it sounds a bit cryptic here is something more about=
 our use-case:</div><div>We have different HBase tables which we want to &q=
uot;pseudo-join&quot; in Map/Reduce computations. The node types I mentione=
d above correspond to the respective row-key types used in each of those ta=
bles, edges are generated by the fact that the KeyValues in each table can =
contain row-key values found in one of the other tables.</div>


<div><br></div><div>The graph would describe these relations. In a Map/Redu=
ce I then want to be able to iterate over all nodes of a given type while a=
lso having access to a node&#39;s successor nodes in the same Mapper instan=
ce or better yet the same map() call. One would then carry out additional G=
ets to retrieve the data from the tables thus doing a fairly crude join.</d=
iv>


<div><br></div><div>The Graph is likely to change so it would be nice if it=
 could be updated incrementally.</div><div><br></div><div>Does all this sou=
nd like something that would be possible with Giraph?</div><div><br></div>


<div>Thank you,</div><div><br></div><div>/David</div><div><br></div><div><b=
r></div><div><br></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span><=
font color=3D"#888888">-- <br> =A0 =A0Claudio Martella<br> =A0 =A0<a>claudi=
o.martella@gmail.com</a>=A0 =A0
</font></span></div>
</blockquote></div><br>
</blockquote><br><br></div></div><span class=3D"HOEnZb"><font color=3D"#888=
888">-- <br> =A0 =A0Claudio Martella<br> =A0 =A0<a href=3D"mailto:claudio.m=
artella@gmail.com" target=3D"_blank">claudio.martella@gmail.com</a>=A0 =A0<=
br>
</font></span></blockquote></div><br></div>

--f46d043bd8a08c5b2304d4698212--