giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maria Stylianou <mars...@gmail.com>
Subject Re: Questions on input/output format
Date Wed, 15 May 2013 16:56:15 GMT
Cool, I didn't know that :) So in the command line we have the -eif for the
edgeInputFormat and -vif for the vertexInputFormat?
Keep us updated how it works and what other difficulties you may have!


On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta <alessandro@fb.com>wrote:

>  Hi Han,
>
>  You are correct: if you are loading the graph with an EdgeInputFormat,
> but also need to load additional data for vertices, you want to use a
> VertexValueInputFormat.
> You can see an example in TestEdgeInput.
>
>  Alessandro
>
>   From: Han JU <ju.han.felix@gmail.com>
> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
> Date: Wednesday, May 15, 2013 9:00 AM
> To: "user@giraph.apache.org" <user@giraph.apache.org>
> Subject: Re: Questions on input/output format
>
>   Thanks Maria.
>
>  For the input part, in fact what I want to load is a bipartite graph, so
> nodes are in two separate sets. If I use TextEdgeInputFormat, how could I
> load data for the nodes? (for example a flag indicating in which set the
> node is).
>
>  On the website it says: In the second case, edges will be read by means
> of an EdgeInputFormat. If there is additional data for the vertices, it
> will be read separately by a VertexValueInputFormat. So it seems to me
> that there should be two separate reads: the first one reads all the edges
> of the bipartite graph, and the second one reads the nodes with their data.
> But I can't find any examples of how to do this.
>
>
>
>
> 2013/5/15 Maria Stylianou <marsty5@gmail.com>
>
>>  The InputFormat is the code needed to read the input file. So, you
>> cannot have two InputFormats, you should choose one of the two.
>> From my understanding, TextEdgeInputFormat is more suitable for you as it
>> takes exactly the format of your input file: node1 node2 edgeValue
>> The TextVertexInputFormat reads files with the format:
>> nodeId nodeValue {list with edges values}
>>
>>  As for the outputFormat, if you want to print several
>> parameteres/results from your code, then I would suggest to create your own
>> outputFormat which will extend the TextVertexOutputFormat, and in
>> the convertVertexToLine() you can say what to be printed from each vertex.
>> For example you have this error calculated by each vertex and you can
>> retrieve this error from the public method getError(). In
>> the convertVertexToLine(), you can have
>> int error = ((yourMainCodeName) vertex).getError();
>>
>>  and then you shape the line to be printed from each vertex, for example:
>> Text line = new Text("vertexId: + vertex.getId().toString() + ", error:"
>> + error);
>>  return new Text(line);
>>
>>  I hope I didn't make it more complicated :)
>> Cheers,
>>
>> On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.felix@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  Some questions:
>>>
>>>    - My input file is a text file with edges: node1 node2 edgeValue, I
>>> figured it out that I should use TextEdgeInputFormat and
>>> TextVertexValueInputFormat. But how do these two things fit together?
>>> Should I prepare another file that contains only the node informations for
>>> VertexValueInputFormat?
>>>
>>>    - If the input file is a sequence file, how should I implement a
>>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already?
>>>
>>>    - For output part, what I need to do is after the calculation
>>> terminates, every vertex need to output many lines. This could be big (for
>>> a dataset the output size is 400GB). I found only the TextVertexOuputFormat
>>> but it seems to output a single line per vertex. How should I achieve this?
>>>
>>>  Thanks a lot!
>>>
>>>  --
>>> *JU Han*
>>>
>>>    Software Engineer Intern @ KXEN Inc.
>>>   UTC   -  Université de Technologie de Compiègne
>>>    *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>>  +33 0619608888
>>>
>>
>>
>>
>>   --
>> Maria Stylianou
>> Intern at Telefonica, Barcelona, Spain
>>  marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>
>>
>>
>
>
>  --
> *JU Han*
>
>    Software Engineer Intern @ KXEN Inc.
>   UTC   -  Université de Technologie de Compiègne
>    *     **GI06 - Fouille de Données et Décisionnel*
>
>  +33 0619608888
>



-- 
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>

Mime
View raw message