giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Koch <ogd...@googlemail.com>
Subject Re: Multiple node types in Giraph and doing a selective M/R over one of them
Date Tue, 29 Jan 2013 08:52:13 GMT
Hello Claudio and Eli,

Thank you for your answers. As far as Map/Reduce being a better tool for
the job - I was under the impression that Giraph relies on the M/R
framework. It seems like it when I check the console output of the examples
on the project's Wiki.

Again, thank you.

/David

On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> One more general point would be whether giraph is a better tool for your
> problem. From my understanding, map reduce is probably the way to go.
>
>
> On Monday, January 28, 2013, Eli Reisman wrote:
>
>> I agree, something like this is possible using the vertex value. In
>> giraph, we now have native support for multigraphs, but before we had that
>> support, I described a kind of "cheat" to process multigraphs. You could
>> use a variation of that same cheat (its on the site confluence wiki) to do
>> what you're talking about I think, even though you're not dealing with a
>> multigraph in the problem you described. Essentially, you can get clever
>> about what sort of Writable you use for the vertex value type, and/or what
>> the values it holds can represent in your dataset.
>>
>> Alternately, in the off chance that the row-keys do not repeat in the
>> tables, then really the "row key" can be a Writable vertex ID as long as
>> each is unique .The only repetition would be the fact that other rows with
>> their own unique row-keys contain row values that mark out-edges to other
>> unique row-keys in the table, but more than once since any row-key could
>> have lots of other rows "pointing" an out-edge value towards it. Thinking
>> of each row key as unique vertex ID then just turns this into a vanilla
>> graph. However, if the row keys are not unique in among all your tables,
>> this oversimplifies the problem and you really are stuck wtih the above
>> vertex value option.
>>
>> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's, and
>> message data types, and as long as the properties required of each for a
>> graph are met, and each is a Writable, you can think of the problem (often)
>> in one of several ways that Giraph can support.
>>
>> One last thought: assuming the graph does not mutate during processing,
>> you could also write a custom input format that evaluates each row as it
>> builds it into a graph vertex data structure, and chooses only row keys
>> that are of a certain classification in your use case to make into graph
>> data for that job run, simply skipping the other rows as it reads them.
>> again, this "solution" depends on the nature of your problem. Just
>> something to play with.
>>
>> Good luck with your use case!
>>
>> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella <
>> claudio.martella@gmail.com> wrote:
>>
>>> Giraph does not support multipartite graph in a natural way. But you can
>>> try to model your different sets through the vertexvalue. You can then
>>> propagate it (by composing with the ID?) to the neighbors, and obtain your
>>> join.
>>>
>>>
>>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogdude@googlemail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> In Giraph is it possible to have different node types in a graph and
>>>> have a Map/Reduce only iterate over nodes of this type and their direct
>>>> successors?
>>>>
>>>> If it sounds a bit cryptic here is something more about our use-case:
>>>> We have different HBase tables which we want to "pseudo-join" in
>>>> Map/Reduce computations. The node types I mentioned above correspond to the
>>>> respective row-key types used in each of those tables, edges are generated
>>>> by the fact that the KeyValues in each table can contain row-key values
>>>> found in one of the other tables.
>>>>
>>>> The graph would describe these relations. In a Map/Reduce I then want
>>>> to be able to iterate over all nodes of a given type while also having
>>>> access to a node's successor nodes in the same Mapper instance or better
>>>> yet the same map() call. One would then carry out additional Gets to
>>>> retrieve the data from the tables thus doing a fairly crude join.
>>>>
>>>> The Graph is likely to change so it would be nice if it could be
>>>> updated incrementally.
>>>>
>>>> Does all this sound like something that would be possible with Giraph?
>>>>
>>>> Thank you,
>>>>
>>>> /David
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>>>
>>
>>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Mime
View raw message