giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maria Stylianou <mars...@gmail.com>
Subject Re: Questions on input/output format
Date Wed, 15 May 2013 14:32:34 GMT
The InputFormat is the code needed to read the input file. So, you cannot
have two InputFormats, you should choose one of the two.
>From my understanding, TextEdgeInputFormat is more suitable for you as it
takes exactly the format of your input file: node1 node2 edgeValue
The TextVertexInputFormat reads files with the format:
nodeId nodeValue {list with edges values}

As for the outputFormat, if you want to print several parameteres/results
from your code, then I would suggest to create your own outputFormat which
will extend the TextVertexOutputFormat, and in the convertVertexToLine()
you can say what to be printed from each vertex.
For example you have this error calculated by each vertex and you can
retrieve this error from the public method getError(). In
the convertVertexToLine(), you can have
int error = ((yourMainCodeName) vertex).getError();

and then you shape the line to be printed from each vertex, for example:
Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" +
error);
return new Text(line);

I hope I didn't make it more complicated :)
Cheers,

On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi,
>
> Some questions:
>
>   - My input file is a text file with edges: node1 node2 edgeValue, I
> figured it out that I should use TextEdgeInputFormat and
> TextVertexValueInputFormat. But how do these two things fit together?
> Should I prepare another file that contains only the node informations for
> VertexValueInputFormat?
>
>   - If the input file is a sequence file, how should I implement a
> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already?
>
>   - For output part, what I need to do is after the calculation
> terminates, every vertex need to output many lines. This could be big (for
> a dataset the output size is 400GB). I found only the TextVertexOuputFormat
> but it seems to output a single line per vertex. How should I achieve this?
>
> Thanks a lot!
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
marsty5.wordpress.com

Mime
View raw message