giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "G.W." <>
Subject Re: Undirected Vertex Definition and Reflexivity
Date Wed, 11 Mar 2015 00:54:48 GMT
Thanks for that!

This is the right idea, however I was only using a VertexReader until now
– IntNullReverseTextEdgeInputFormat calls for an EdgeReader.

I am not sure this is the way it works but I like the idea of segregating
edge and vertex definitions.

*That leads to the following questions: can Giraph support the use of a
VertexReader and EdgeReader at the same time, that is through the -vif and
-eif arguments? *

If that works, the idea would be to refactor my input files with:

vertex_id, vertex_type, ...

source_id, target_id

with the edge reader working in "reverse" mode.


On 10 March 2015 at 20:02, Matthew Saltz <> wrote:

> Have a look at IntNullReverseTextEdgeInputFormat
> <>.
> It automatically creates reverse edges, but it expects the file format
> <source_id, target_id>
> on each line. If you need to convert it to use longs you can change the
> code pretty easily.
> Best,
> Matthew
> On Tue, Mar 10, 2015 at 5:37 AM, Young Han <> wrote:
>> The input is assumed to be the vertex followed by a set of *directed*
>> edges. So, in your example, leaving out E2 means that the final graph will
>> not have the directed edge from V2 to V1. To get an undirected edge, you
>> need a pair of directed edges.
>> Internally, Giraph stores the out-edges of each vertex as an adjacency
>> list at that vertex. So, for example, your undirected graph becomes a
>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an
>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency
>> list {V2} and a vertex V2 with an empty adjacency list {}.
>> There's no easy way for Giraph to infer that V2's adjacency list should
>> contain V1, because V2 does not track its in-edges. To get around this, you
>> can either (1) use an undirected input file with both pairs of edges
>> present; (2) have, in your algorithms, all vertices broadcast their ids to
>> their out-edge neighbours and then perform mutations to add the missing
>> edges in the first two superstep; or (3) modify the code in
>>* (in giraph-core) to cache and add missing edges
>> (i.e., add a new "type" of input format). I'm fairly certain that there
>> doesn't already exist an "assume undirected graph" input reader, but I'm
>> not too familiar with the code paths and options there so I could be wrong.
>> Young
>> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <> wrote:
>>> Hi Giraph Mailing List,
>>> I am writing about an undirected graph I am trying to move to Giraph. I
>>> have a question about the assumption Giraph makes when processing an input.
>>> Let V1 and V2, two vertices connected with a common edge. E1 defines an
>>> edge from V1 to V2. E2 defines an edge from V2 to V1.
>>> Simply put, these are defined in an input file as:
>>> V1, E1
>>> V2, E2
>>> This is working fine, I can process the graph accordingly.
>>> I was trying to see what would happen if I was to simplify the input
>>> file:
>>> V1, E1
>>> V2
>>> When would come the time that V2 is processed in a superstep, Giraph
>>> would not suggest E1 as an  outgoing edge. My questions is why, knowing
>>> that E1 defines an edge from V1 to V2. The graph being undirected (although
>>> there is no provision for that in my Giraph computation), shouldn't Giraph
>>> assume that V2 is connected to V1?
>>> Down the road, the idea would be to streamline the input format, hence
>>> my question.
>>> Thanks!

View raw message