incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: why we should remove implicit vertex creation
Date Fri, 13 Jan 2012 08:10:09 GMT
Hi Avery,

thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to
me.

On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching <aching@apache.org> wrote:
> Claudio,
>
> You are right that vertices are created automatically when messages are sent
> to non-existent vertices.  But that behavior can be made application
> specific.  The default resolution of mutations/messages is VertexResolver.
>  But you are always welcome to implement your own application specific
> behavior.  For instance, you might just want to drop the message.  If there
> is a simultaneous create/delete, you may want to always create.  You have
> the power to implement any behavior you want by setting the vertex resolver
> (see GiraphJob#setVertexResolverClass()).
>
> Hope this helps,
>
> Avery
>
>
> On 1/12/12 3:42 PM, Claudio Martella wrote:
>>
>> Hello Giraphers,
>>
>> I have a few comments about the current design of Giraph regarding the
>> implicit creation of vertices.
>> As it's currently designed, if you send a message to a non-existent
>> vertices, Giraph creates it for you.
>> Although I can understand it can get handy as it allows for lazy
>> dataset creation, I think it comes at some cost and I believe this
>> cost is bigger than the advantage:
>>
>> 1) it overlaps the mutation API, where a vertex can be created
>> explicitly when the semantics of the algorithm require it, with
>> knowledge about what's going on and with explicit state. This is an
>> ambiguous and unclear part of the API which is difficult for me to
>> justify and probably confusing for the user too. Which brings me to
>> the second point.
>>
>> 2) it requires a different, and partially duplicate,code path for
>> mutations and implicit vertex creation in our code, as it's clear by
>> looking at BasicRPCCommunication and as it's been experienced
>> currently by me in the email I recently sent to the list. Which brings
>> me to the third point.
>>
>> 3) in order to manage this, for every message we have to hit, sooner
>> or later, the Worker vertices set to see if the vertex is existing and
>> whether it should be implicitly created. This is computationally
>> expensive both if you have a HashMap but also if you have a TreeMap
>> for range partitioning. Also, if we're going to create more exotic
>> partitioning (topology-partitioning?), we're going to hit the problem
>> more.
>>
>> In general, I don't know any graph API that doesn't require to either
>> list explicitly the vertex set at load or to create the vertex
>> explicitly through API. As I said, I understand it allows for lazy
>> creation of the input file, with possibly missing vertices explicitly
>> enlisted (missing as a source vertex but existing as an endpoint for
>> an edge), but this could be really fixed robustly by a single
>> MapReduce job.
>>
>> What do you guys think?
>>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message