incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <>
Subject Re: why we should remove implicit vertex creation
Date Fri, 13 Jan 2012 19:05:15 GMT
Inline responses.

Happy Friday,


On 1/13/12 10:51 AM, Claudio Martella wrote:
> Hi Avery,
> thanks for your feedback.
> I'm advocating for allowing mutations only through Mutable interface
> methods. I agree that it can come handy to have the implicit vertex
> creation, for the reason you mentioned (which I called lazy inputset
> creation), but you can obtain the same through a simple  single M/R
> job run in advance.
I think this is pretty expensive (extra MR job).  Users do have this 
option, but I doubt many would take it when they don't have to.

> What we win back is that we don't have the
> computational cost, and code complexity of checking if the vertex
> exists already for each message we get.
Checking if the vertex exists is pretty cheap in a hashmap (constant 
time).  We should verify that this is a computational overhead (maybe 
some profiling) before optimizing it.  I suppose we could add a switch 
to bypass any graph mutation in general.
> You know what I mean?
> On Fri, Jan 13, 2012 at 7:44 PM, Avery Ching<>  wrote:
>> Claudio,
>> What are you advocating in particular?
>> Graph mutation should be allowed (i.e. adding vertices).  We allow this to
>> happen through the addVertexReq() interface and through the VertexResolver
>> implementation (say for messages to non-existent vertices).  I can see why
>> this would be useful.  Imagine you are computing page rank on the web graph,
>> but you only have a subset of the sites, but all the outlinks for each site.
>>   It is nice to be able to allow new vertices (sites) while running the
>> application.
>> I agree that the way that vertices are created and initialized is a bit
>> vague.  We can work on improving the interfaces if anyone has suggestions.
>> Avery
>> On 1/13/12 12:10 AM, Claudio Martella wrote:
>>> Hi Avery,
>>> thanks for your feedback. I know that users can decide to drop this
>>> behavior, but this doesn't mean that those three points don't hold, to
>>> me.
>>> On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching<>    wrote:
>>>> Claudio,
>>>> You are right that vertices are created automatically when messages are
>>>> sent
>>>> to non-existent vertices.  But that behavior can be made application
>>>> specific.  The default resolution of mutations/messages is
>>>> VertexResolver.
>>>>   But you are always welcome to implement your own application specific
>>>> behavior.  For instance, you might just want to drop the message.  If
>>>> there
>>>> is a simultaneous create/delete, you may want to always create.  You have
>>>> the power to implement any behavior you want by setting the vertex
>>>> resolver
>>>> (see GiraphJob#setVertexResolverClass()).
>>>> Hope this helps,
>>>> Avery
>>>> On 1/12/12 3:42 PM, Claudio Martella wrote:
>>>>> Hello Giraphers,
>>>>> I have a few comments about the current design of Giraph regarding the
>>>>> implicit creation of vertices.
>>>>> As it's currently designed, if you send a message to a non-existent
>>>>> vertices, Giraph creates it for you.
>>>>> Although I can understand it can get handy as it allows for lazy
>>>>> dataset creation, I think it comes at some cost and I believe this
>>>>> cost is bigger than the advantage:
>>>>> 1) it overlaps the mutation API, where a vertex can be created
>>>>> explicitly when the semantics of the algorithm require it, with
>>>>> knowledge about what's going on and with explicit state. This is an
>>>>> ambiguous and unclear part of the API which is difficult for me to
>>>>> justify and probably confusing for the user too. Which brings me to
>>>>> the second point.
>>>>> 2) it requires a different, and partially duplicate,code path for
>>>>> mutations and implicit vertex creation in our code, as it's clear by
>>>>> looking at BasicRPCCommunication and as it's been experienced
>>>>> currently by me in the email I recently sent to the list. Which brings
>>>>> me to the third point.
>>>>> 3) in order to manage this, for every message we have to hit, sooner
>>>>> or later, the Worker vertices set to see if the vertex is existing and
>>>>> whether it should be implicitly created. This is computationally
>>>>> expensive both if you have a HashMap but also if you have a TreeMap
>>>>> for range partitioning. Also, if we're going to create more exotic
>>>>> partitioning (topology-partitioning?), we're going to hit the problem
>>>>> more.
>>>>> In general, I don't know any graph API that doesn't require to either
>>>>> list explicitly the vertex set at load or to create the vertex
>>>>> explicitly through API. As I said, I understand it allows for lazy
>>>>> creation of the input file, with possibly missing vertices explicitly
>>>>> enlisted (missing as a source vertex but existing as an endpoint for
>>>>> an edge), but this could be really fixed robustly by a single
>>>>> MapReduce job.
>>>>> What do you guys think?

View raw message