giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Presta <alessan...@fb.com>
Subject Re: Differences with Edge and Vertex Input Format
Date Sun, 20 Jan 2013 16:42:45 GMT
Hi Peter,

Good questions.

1) If you only specify an EdgeInputFormat, vertex values will be initialized to their type's
default value. You can also specify a VertexValueInputFormat, which is just a more convenient
API around VertexInputFormat to read vertex values.

2) They will be created as they receive the first message, unless you override VertexResolver
with some other behavior.

3) In general vertex input is more efficient because of what you said, and because it's a
more compact representation. However, if your original dataset is in the form of a list of
edges, the additional step of grouping them by source vertex might be more expensive than
doing that in Giraph (depending on your infrastructure).

4) We don't have a way to enforce which worker will read what splits, so I think in general
you can expect most of the data to be shuffled across workers.

Alessandro

Sent from my iPhone

On Jan 20, 2013, at 6:12 AM, "Peter Morgan" <pmorgan246@gmail.com> wrote:

> I'm interested in hearing about the differences in loading using the edge and vertex
inputs. In particular, I have a few questions:
> 
> 1) How can vertex state be set using edge input format?
> 2) How are vertices with only in-edges initialised using edge input format?
> 3) Is either vertex or edge input more efficient for loading? I guess less needs to be
shuffled around the network using vertex input?
> 4) If your adjacency list for vertex input is pre-partitioned, does this decrease loading
time as again vertices don't need to be shuffled across the network?
> 
> Thanks in advance for any help.
> Peter

Mime
View raw message