Hi,
I believe the answer to your question is yes, though I've never done it. If
you use only the edge reader, only the vertices in your graph that have at
least one edge attached to them will be present in your graph. So, if you
have vertices that are entirely disconnected that you want included, you'd
need to do both a VertexReader and an Edge Reader (though I've never done
this). If you don't have disconnected vertices, you don't need the
VertexReader because Giraph will automatically add all of the vertices in
your edge file to the graph (I think this can be disabled). Use the eip
flag to specify the edge file.
Best,
Matthew Saltz
On Wed, Mar 11, 2015 at 1:54 AM, G.W. <gwindels+a@gmail.com> wrote:
> Thanks for that!
>
> This is the right idea, however I was only using a VertexReader until now
> – IntNullReverseTextEdgeInputFormat calls for an EdgeReader.
>
> I am not sure this is the way it works but I like the idea of segregating
> edge and vertex definitions.
>
> *That leads to the following questions: can Giraph support the use of a
> VertexReader and EdgeReader at the same time, that is through the vif and
> eif arguments? *
>
> If that works, the idea would be to refactor my input files with:
>
> Vertices:
> vertex_id, vertex_type, ...
>
> Edges
> source_id, target_id
>
> with the edge reader working in "reverse" mode.
>
> Thanks!
>
>
>
>
> On 10 March 2015 at 20:02, Matthew Saltz <saltzm@gmail.com> wrote:
>
>> Have a look at IntNullReverseTextEdgeInputFormat
>> <https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html>.
>> It automatically creates reverse edges, but it expects the file format
>>
>> <source_id, target_id>
>>
>> on each line. If you need to convert it to use longs you can change the
>> code pretty easily.
>>
>> Best,
>> Matthew
>>
>> On Tue, Mar 10, 2015 at 5:37 AM, Young Han <young.han@uwaterloo.ca>
>> wrote:
>>
>>> The input is assumed to be the vertex followed by a set of *directed*
>>> edges. So, in your example, leaving out E2 means that the final graph will
>>> not have the directed edge from V2 to V1. To get an undirected edge, you
>>> need a pair of directed edges.
>>>
>>> Internally, Giraph stores the outedges of each vertex as an adjacency
>>> list at that vertex. So, for example, your undirected graph becomes a
>>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an
>>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency
>>> list {V2} and a vertex V2 with an empty adjacency list {}.
>>>
>>> There's no easy way for Giraph to infer that V2's adjacency list should
>>> contain V1, because V2 does not track its inedges. To get around this, you
>>> can either (1) use an undirected input file with both pairs of edges
>>> present; (2) have, in your algorithms, all vertices broadcast their ids to
>>> their outedge neighbours and then perform mutations to add the missing
>>> edges in the first two superstep; or (3) modify the code in
>>> org.apache.giraph.io.* (in giraphcore) to cache and add missing edges
>>> (i.e., add a new "type" of input format). I'm fairly certain that there
>>> doesn't already exist an "assume undirected graph" input reader, but I'm
>>> not too familiar with the code paths and options there so I could be wrong.
>>>
>>> Young
>>>
>>> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <gwindels+a@gmail.com> wrote:
>>>
>>>> Hi Giraph Mailing List,
>>>>
>>>> I am writing about an undirected graph I am trying to move to Giraph. I
>>>> have a question about the assumption Giraph makes when processing an input.
>>>>
>>>> Let V1 and V2, two vertices connected with a common edge. E1 defines an
>>>> edge from V1 to V2. E2 defines an edge from V2 to V1.
>>>>
>>>> Simply put, these are defined in an input file as:
>>>> V1, E1
>>>> V2, E2
>>>>
>>>> This is working fine, I can process the graph accordingly.
>>>>
>>>> I was trying to see what would happen if I was to simplify the input
>>>> file:
>>>> V1, E1
>>>> V2
>>>>
>>>> When would come the time that V2 is processed in a superstep, Giraph
>>>> would not suggest E1 as an outgoing edge. My questions is why, knowing
>>>> that E1 defines an edge from V1 to V2. The graph being undirected (although
>>>> there is no provision for that in my Giraph computation), shouldn't Giraph
>>>> assume that V2 is connected to V1?
>>>>
>>>> Down the road, the idea would be to streamline the input format, hence
>>>> my question.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>
>>
>
