giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject Re: Questions on input/output format
Date Fri, 17 May 2013 15:01:32 GMT
Thank you Alessandro. I've learn a lot from the test cases.


2013/5/15 Maria Stylianou <marsty5@gmail.com>

> Cool, I didn't know that :) So in the command line we have the -eif for
> the edgeInputFormat and -vif for the vertexInputFormat?
> Keep us updated how it works and what other difficulties you may have!
>
>
>
> On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta <alessandro@fb.com>wrote:
>
>>  Hi Han,
>>
>>  You are correct: if you are loading the graph with an EdgeInputFormat,
>> but also need to load additional data for vertices, you want to use a
>> VertexValueInputFormat.
>> You can see an example in TestEdgeInput.
>>
>>  Alessandro
>>
>>   From: Han JU <ju.han.felix@gmail.com>
>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
>> Date: Wednesday, May 15, 2013 9:00 AM
>> To: "user@giraph.apache.org" <user@giraph.apache.org>
>> Subject: Re: Questions on input/output format
>>
>>   Thanks Maria.
>>
>>  For the input part, in fact what I want to load is a bipartite graph,
>> so nodes are in two separate sets. If I use TextEdgeInputFormat, how could
>> I load data for the nodes? (for example a flag indicating in which set the
>> node is).
>>
>>  On the website it says: In the second case, edges will be read by means
>> of an EdgeInputFormat. If there is additional data for the vertices, it
>> will be read separately by a VertexValueInputFormat. So it seems to me
>> that there should be two separate reads: the first one reads all the edges
>> of the bipartite graph, and the second one reads the nodes with their data.
>> But I can't find any examples of how to do this.
>>
>>
>>
>>
>> 2013/5/15 Maria Stylianou <marsty5@gmail.com>
>>
>>>  The InputFormat is the code needed to read the input file. So, you
>>> cannot have two InputFormats, you should choose one of the two.
>>> From my understanding, TextEdgeInputFormat is more suitable for you as
>>> it takes exactly the format of your input file: node1 node2 edgeValue
>>> The TextVertexInputFormat reads files with the format:
>>> nodeId nodeValue {list with edges values}
>>>
>>>  As for the outputFormat, if you want to print several
>>> parameteres/results from your code, then I would suggest to create your own
>>> outputFormat which will extend the TextVertexOutputFormat, and in
>>> the convertVertexToLine() you can say what to be printed from each vertex.
>>> For example you have this error calculated by each vertex and you can
>>> retrieve this error from the public method getError(). In
>>> the convertVertexToLine(), you can have
>>> int error = ((yourMainCodeName) vertex).getError();
>>>
>>>  and then you shape the line to be printed from each vertex, for
>>> example:
>>> Text line = new Text("vertexId: + vertex.getId().toString() + ", error:"
>>> + error);
>>>  return new Text(line);
>>>
>>>  I hope I didn't make it more complicated :)
>>> Cheers,
>>>
>>> On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.felix@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  Some questions:
>>>>
>>>>    - My input file is a text file with edges: node1 node2 edgeValue, I
>>>> figured it out that I should use TextEdgeInputFormat and
>>>> TextVertexValueInputFormat. But how do these two things fit together?
>>>> Should I prepare another file that contains only the node informations for
>>>> VertexValueInputFormat?
>>>>
>>>>    - If the input file is a sequence file, how should I implement a
>>>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already?
>>>>
>>>>    - For output part, what I need to do is after the calculation
>>>> terminates, every vertex need to output many lines. This could be big (for
>>>> a dataset the output size is 400GB). I found only the TextVertexOuputFormat
>>>> but it seems to output a single line per vertex. How should I achieve this?
>>>>
>>>>  Thanks a lot!
>>>>
>>>>  --
>>>> *JU Han*
>>>>
>>>>    Software Engineer Intern @ KXEN Inc.
>>>>   UTC   -  Université de Technologie de Compiègne
>>>>    *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>>  +33 0619608888
>>>>
>>>
>>>
>>>
>>>   --
>>> Maria Stylianou
>>> Intern at Telefonica, Barcelona, Spain
>>>  marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>
>>>
>>>
>>
>>
>>  --
>> *JU Han*
>>
>>    Software Engineer Intern @ KXEN Inc.
>>   UTC   -  Université de Technologie de Compiègne
>>    *     **GI06 - Fouille de Données et Décisionnel*
>>
>>  +33 0619608888
>>
>
>
>
> --
> Maria Stylianou
> Intern at Telefonica, Barcelona, Spain
> marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>
>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Mime
View raw message