giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Presta <>
Subject Re: Questions on input/output format
Date Wed, 15 May 2013 16:36:39 GMT
Hi Han,

You are correct: if you are loading the graph with an EdgeInputFormat, but also need to load
additional data for vertices, you want to use a VertexValueInputFormat.
You can see an example in TestEdgeInput.


From: Han JU <<>>
Reply-To: "<>" <<>>
Date: Wednesday, May 15, 2013 9:00 AM
To: "<>" <<>>
Subject: Re: Questions on input/output format

Thanks Maria.

For the input part, in fact what I want to load is a bipartite graph, so nodes are in two
separate sets. If I use TextEdgeInputFormat, how could I load data for the nodes? (for example
a flag indicating in which set the node is).

On the website it says: In the second case, edges will be read by means of an EdgeInputFormat.
If there is additional data for the vertices, it will be read separately by a VertexValueInputFormat.
So it seems to me that there should be two separate reads: the first one reads all the edges
of the bipartite graph, and the second one reads the nodes with their data. But I can't find
any examples of how to do this.

2013/5/15 Maria Stylianou <<>>
The InputFormat is the code needed to read the input file. So, you cannot have two InputFormats,
you should choose one of the two.
>From my understanding, TextEdgeInputFormat is more suitable for you as it takes exactly
the format of your input file: node1 node2 edgeValue
The TextVertexInputFormat reads files with the format:
nodeId nodeValue {list with edges values}

As for the outputFormat, if you want to print several parameteres/results from your code,
then I would suggest to create your own outputFormat which will extend the TextVertexOutputFormat,
and in the convertVertexToLine() you can say what to be printed from each vertex.
For example you have this error calculated by each vertex and you can retrieve this error
from the public method getError(). In the convertVertexToLine(), you can have
int error = ((yourMainCodeName) vertex).getError();

and then you shape the line to be printed from each vertex, for example:
Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" + error);
return new Text(line);

I hope I didn't make it more complicated :)

On Wed, May 15, 2013 at 12:27 PM, Han JU <<>>

Some questions:

  - My input file is a text file with edges: node1 node2 edgeValue, I figured it out that
I should use TextEdgeInputFormat and TextVertexValueInputFormat. But how do these two things
fit together? Should I prepare another file that contains only the node informations for VertexValueInputFormat?

  - If the input file is a sequence file, how should I implement a SequenceEdgeInputFormat
or SequenceVertexInputFormat? Or they exist already?

  - For output part, what I need to do is after the calculation terminates, every vertex need
to output many lines. This could be big (for a dataset the output size is 400GB). I found
only the TextVertexOuputFormat but it seems to output a single line per vertex. How should
I achieve this?

Thanks a lot!

JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888<tel:%2B33%200619608888>

Maria Stylianou
Intern at Telefonica, Barcelona, Spain<>

JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888

View raw message