giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: Multiple node types in Giraph and doing a selective M/R over one of them
Date Tue, 29 Jan 2013 09:26:31 GMT
That is correct, but that is not the reason. We use M/R for resource
allocation, but we do not inherit the limits of the M/R paradigm for
graphs. The thing is that you probably have a lot of data (I assume it
because you are using HBase, hence it is difficult for you to fit it all
into memory), and you do not have an iterative process (hence you do not
hit the costs of multiple M/R jobs). Plus, the join is a well understood
problem in M/R.


On Tue, Jan 29, 2013 at 9:52 AM, David Koch <ogdude@googlemail.com> wrote:

> Hello Claudio and Eli,
>
> Thank you for your answers. As far as Map/Reduce being a better tool for
> the job - I was under the impression that Giraph relies on the M/R
> framework. It seems like it when I check the console output of the examples
> on the project's Wiki.
>
> Again, thank you.
>
> /David
>
>
> On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella <
> claudio.martella@gmail.com> wrote:
>
>> One more general point would be whether giraph is a better tool for your
>> problem. From my understanding, map reduce is probably the way to go.
>>
>>
>> On Monday, January 28, 2013, Eli Reisman wrote:
>>
>>> I agree, something like this is possible using the vertex value. In
>>> giraph, we now have native support for multigraphs, but before we had that
>>> support, I described a kind of "cheat" to process multigraphs. You could
>>> use a variation of that same cheat (its on the site confluence wiki) to do
>>> what you're talking about I think, even though you're not dealing with a
>>> multigraph in the problem you described. Essentially, you can get clever
>>> about what sort of Writable you use for the vertex value type, and/or what
>>> the values it holds can represent in your dataset.
>>>
>>> Alternately, in the off chance that the row-keys do not repeat in the
>>> tables, then really the "row key" can be a Writable vertex ID as long as
>>> each is unique .The only repetition would be the fact that other rows with
>>> their own unique row-keys contain row values that mark out-edges to other
>>> unique row-keys in the table, but more than once since any row-key could
>>> have lots of other rows "pointing" an out-edge value towards it. Thinking
>>> of each row key as unique vertex ID then just turns this into a vanilla
>>> graph. However, if the row keys are not unique in among all your tables,
>>> this oversimplifies the problem and you really are stuck wtih the above
>>> vertex value option.
>>>
>>> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's,
>>> and message data types, and as long as the properties required of each for
>>> a graph are met, and each is a Writable, you can think of the problem
>>> (often) in one of several ways that Giraph can support.
>>>
>>> One last thought: assuming the graph does not mutate during processing,
>>> you could also write a custom input format that evaluates each row as it
>>> builds it into a graph vertex data structure, and chooses only row keys
>>> that are of a certain classification in your use case to make into graph
>>> data for that job run, simply skipping the other rows as it reads them.
>>> again, this "solution" depends on the nature of your problem. Just
>>> something to play with.
>>>
>>> Good luck with your use case!
>>>
>>> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella <
>>> claudio.martella@gmail.com> wrote:
>>>
>>>> Giraph does not support multipartite graph in a natural way. But you
>>>> can try to model your different sets through the vertexvalue. You can then
>>>> propagate it (by composing with the ID?) to the neighbors, and obtain your
>>>> join.
>>>>
>>>>
>>>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogdude@googlemail.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> In Giraph is it possible to have different node types in a graph and
>>>>> have a Map/Reduce only iterate over nodes of this type and their direct
>>>>> successors?
>>>>>
>>>>> If it sounds a bit cryptic here is something more about our use-case:
>>>>> We have different HBase tables which we want to "pseudo-join" in
>>>>> Map/Reduce computations. The node types I mentioned above correspond
to the
>>>>> respective row-key types used in each of those tables, edges are generated
>>>>> by the fact that the KeyValues in each table can contain row-key values
>>>>> found in one of the other tables.
>>>>>
>>>>> The graph would describe these relations. In a Map/Reduce I then want
>>>>> to be able to iterate over all nodes of a given type while also having
>>>>> access to a node's successor nodes in the same Mapper instance or better
>>>>> yet the same map() call. One would then carry out additional Gets to
>>>>> retrieve the data from the tables thus doing a fairly crude join.
>>>>>
>>>>> The Graph is likely to change so it would be nice if it could be
>>>>> updated incrementally.
>>>>>
>>>>> Does all this sound like something that would be possible with Giraph?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> /David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>    Claudio Martella
>>>>    claudio.martella@gmail.com
>>>>
>>>
>>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com
>>
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message