giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <>
Subject Re: Questions on input/output format
Date Wed, 15 May 2013 16:00:11 GMT
Thanks Maria.

For the input part, in fact what I want to load is a bipartite graph, so
nodes are in two separate sets. If I use TextEdgeInputFormat, how could I
load data for the nodes? (for example a flag indicating in which set the
node is).

On the website it says: In the second case, edges will be read by means of
an EdgeInputFormat. If there is additional data for the vertices, it will
be read separately by a VertexValueInputFormat. So it seems to me that
there should be two separate reads: the first one reads all the edges of
the bipartite graph, and the second one reads the nodes with their data.
But I can't find any examples of how to do this.

2013/5/15 Maria Stylianou <>

> The InputFormat is the code needed to read the input file. So, you cannot
> have two InputFormats, you should choose one of the two.
> From my understanding, TextEdgeInputFormat is more suitable for you as it
> takes exactly the format of your input file: node1 node2 edgeValue
> The TextVertexInputFormat reads files with the format:
> nodeId nodeValue {list with edges values}
> As for the outputFormat, if you want to print several parameteres/results
> from your code, then I would suggest to create your own outputFormat which
> will extend the TextVertexOutputFormat, and in the convertVertexToLine()
> you can say what to be printed from each vertex.
> For example you have this error calculated by each vertex and you can
> retrieve this error from the public method getError(). In
> the convertVertexToLine(), you can have
> int error = ((yourMainCodeName) vertex).getError();
> and then you shape the line to be printed from each vertex, for example:
> Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" +
> error);
> return new Text(line);
> I hope I didn't make it more complicated :)
> Cheers,
> On Wed, May 15, 2013 at 12:27 PM, Han JU <> wrote:
>> Hi,
>> Some questions:
>>   - My input file is a text file with edges: node1 node2 edgeValue, I
>> figured it out that I should use TextEdgeInputFormat and
>> TextVertexValueInputFormat. But how do these two things fit together?
>> Should I prepare another file that contains only the node informations for
>> VertexValueInputFormat?
>>   - If the input file is a sequence file, how should I implement a
>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already?
>>   - For output part, what I need to do is after the calculation
>> terminates, every vertex need to output many lines. This could be big (for
>> a dataset the output size is 400GB). I found only the TextVertexOuputFormat
>> but it seems to output a single line per vertex. How should I achieve this?
>> Thanks a lot!
>> --
>> *JU Han*
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>> +33 0619608888
> --
> Maria Stylianou
> Intern at Telefonica, Barcelona, Spain

*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

View raw message