Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0056EDEB1 for ; Wed, 15 May 2013 16:57:03 +0000 (UTC) Received: (qmail 43479 invoked by uid 500); 15 May 2013 16:57:02 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 43430 invoked by uid 500); 15 May 2013 16:57:02 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 43422 invoked by uid 99); 15 May 2013 16:57:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 16:57:02 +0000 X-ASF-Spam-Status: No, hits=3.9 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,HTTP_ESCAPED_HOST,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_NOVOWEL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of marsty5@gmail.com designates 209.85.160.51 as permitted sender) Received: from [209.85.160.51] (HELO mail-pb0-f51.google.com) (209.85.160.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 16:56:56 +0000 Received: by mail-pb0-f51.google.com with SMTP id jt11so485825pbb.10 for ; Wed, 15 May 2013 09:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=cguN/S5UOTN4qru6XT11QAt5qAAvBgSL59AdtQufpC0=; b=0nNhcHxAKJfMUQLESEi5WnwcnSQt+LyRSM7Ddreh5wLCGXfbzq1TtQq1rJ7LTI9D5k jEdYHmtDCa4fyOhWIwWcYMdInJ4sR39LEmXsbIPM9O379n3LcamoI3sAMZX3t3F2fL53 SNcWTpW9D53lEUpWdH+Qeq+bE9vJcZiStCr8dLcGAI6EjeyBHilnVh9nafhgLLIVwTza 7y73xMf68Ymymzi1Y9kQdnzC1n0lD8efSynLLIAL4/YdsMIBya1/cB+lRpIPHNQpTISb EgMoi6V0Jz9JPXz6v5wsicbyViM0NeqGQRoe9a1jBmR09fvgOQj2tyn+NdsJjqjUEFQp 4acQ== X-Received: by 10.66.251.202 with SMTP id zm10mr40288757pac.53.1368636995444; Wed, 15 May 2013 09:56:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.8.97 with HTTP; Wed, 15 May 2013 09:56:15 -0700 (PDT) In-Reply-To: <566834A0CA4ED742A53644FF331260D43816EB74@PRN-MBX01-5.TheFacebook.com> References: <566834A0CA4ED742A53644FF331260D43816EB74@PRN-MBX01-5.TheFacebook.com> From: Maria Stylianou Date: Wed, 15 May 2013 18:56:15 +0200 Message-ID: Subject: Re: Questions on input/output format To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=047d7b15a385f0a65904dcc4a146 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b15a385f0a65904dcc4a146 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cool, I didn't know that :) So in the command line we have the -eif for the edgeInputFormat and -vif for the vertexInputFormat? Keep us updated how it works and what other difficulties you may have! On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta wrote= : > Hi Han, > > You are correct: if you are loading the graph with an EdgeInputFormat, > but also need to load additional data for vertices, you want to use a > VertexValueInputFormat. > You can see an example in TestEdgeInput. > > Alessandro > > From: Han JU > Reply-To: "user@giraph.apache.org" > Date: Wednesday, May 15, 2013 9:00 AM > To: "user@giraph.apache.org" > Subject: Re: Questions on input/output format > > Thanks Maria. > > For the input part, in fact what I want to load is a bipartite graph, so > nodes are in two separate sets. If I use TextEdgeInputFormat, how could I > load data for the nodes? (for example a flag indicating in which set the > node is). > > On the website it says: In the second case, edges will be read by means > of an EdgeInputFormat. If there is additional data for the vertices, it > will be read separately by a VertexValueInputFormat. So it seems to me > that there should be two separate reads: the first one reads all the edge= s > of the bipartite graph, and the second one reads the nodes with their dat= a. > But I can't find any examples of how to do this. > > > > > 2013/5/15 Maria Stylianou > >> The InputFormat is the code needed to read the input file. So, you >> cannot have two InputFormats, you should choose one of the two. >> From my understanding, TextEdgeInputFormat is more suitable for you as i= t >> takes exactly the format of your input file: node1 node2 edgeValue >> The TextVertexInputFormat reads files with the format: >> nodeId nodeValue {list with edges values} >> >> As for the outputFormat, if you want to print several >> parameteres/results from your code, then I would suggest to create your = own >> outputFormat which will extend the TextVertexOutputFormat, and in >> the convertVertexToLine() you can say what to be printed from each verte= x. >> For example you have this error calculated by each vertex and you can >> retrieve this error from the public method getError(). In >> the convertVertexToLine(), you can have >> int error =3D ((yourMainCodeName) vertex).getError(); >> >> and then you shape the line to be printed from each vertex, for example= : >> Text line =3D new Text("vertexId: + vertex.getId().toString() + ", error= :" >> + error); >> return new Text(line); >> >> I hope I didn't make it more complicated :) >> Cheers, >> >> On Wed, May 15, 2013 at 12:27 PM, Han JU wrote: >> >>> Hi, >>> >>> Some questions: >>> >>> - My input file is a text file with edges: node1 node2 edgeValue, I >>> figured it out that I should use TextEdgeInputFormat and >>> TextVertexValueInputFormat. But how do these two things fit together? >>> Should I prepare another file that contains only the node informations = for >>> VertexValueInputFormat? >>> >>> - If the input file is a sequence file, how should I implement a >>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist alr= eady? >>> >>> - For output part, what I need to do is after the calculation >>> terminates, every vertex need to output many lines. This could be big (= for >>> a dataset the output size is 400GB). I found only the TextVertexOuputFo= rmat >>> but it seems to output a single line per vertex. How should I achieve t= his? >>> >>> Thanks a lot! >>> >>> -- >>> *JU Han* >>> >>> Software Engineer Intern @ KXEN Inc. >>> UTC - Universit=E9 de Technologie de Compi=E8gne >>> * **GI06 - Fouille de Donn=E9es et D=E9cisionnel* >>> >>> +33 0619608888 >>> >> >> >> >> -- >> Maria Stylianou >> Intern at Telefonica, Barcelona, Spain >> marsty5.wordpress.com >> >> > > > -- > *JU Han* > > Software Engineer Intern @ KXEN Inc. > UTC - Universit=E9 de Technologie de Compi=E8gne > * **GI06 - Fouille de Donn=E9es et D=E9cisionnel* > > +33 0619608888 > --=20 Maria Stylianou Intern at Telefonica, Barcelona, Spain marsty5.wordpress.com --047d7b15a385f0a65904dcc4a146 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Cool, I didn't know that :) So in the command line we = have the=A0-eif for the edgeInputFormat and -vif for the vertexInputFormat?=
Keep us updated how it works and what other difficulties you may have!=


On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta <al= essandro@fb.com> wrote:
Hi Han,

You are correct: if you are loading the graph with an EdgeInputFormat,= but also need to load additional data for vertices, you want to use a Vert= exValueInputFormat.
You can see an example in TestEdgeInput.

Alessandro

From: Han JU <ju.han.felix@gmail.com> Reply-To: "user@giraph.apache.org" &= lt;user@giraph.= apache.org>
Date: Wednesday, May 15, 2013 9:00 = AM
To: "user@giraph.apache.org" <user@giraph.apache= .org>
Subject: Re: Questions on input/out= put format

Thanks Maria.

For the input part, in fact what I want to load is a bipartite graph, = so nodes are in two separate sets. If I use TextEdgeInputFormat, how could = I load data for the nodes? (for example a flag indicating in which set the = node is).

On the website it says:=A0In the second case, edges will be read by me= ans of an EdgeInputFormat. If there is additional data for the vertices, it= will be read separately by a VertexValueInputFormat. So it seems to me that there should be two separate reads: the first = one reads all the edges of the bipartite graph, and the second one reads th= e nodes with their data. But I can't find any examples of how to do this.




2013/5/15 Maria Stylianou <= marsty5@gmail.com>
The InputFormat is the code needed to read the input file. So, you can= not have two InputFormats, you should choose one of the two.=A0
From my understanding, TextEdgeInputFormat is more suitable for you as= it takes exactly the format of your input file: node1 node2 edgeValue
The TextVertexInputFormat reads files with the format:
nodeId nodeValue {list with edges values}

As for the outputFormat, if you want to print several parameteres/resu= lts from your code, then I would suggest to create your own outputFormat wh= ich will extend the=A0TextVertexOutputFormat, and in the=A0convertVertexToL= ine() you can say what to be printed from each vertex.
For example you have this error calculated by each vertex and you can = retrieve this error from the public method getError(). In the=A0convertVert= exToLine(), you can have=A0
int error =3D ((yourMainCodeName) vertex).getError();=A0

and then you shape the line to be printed from each vertex, for exampl= e:
Text line =3D new Text("vertexId: + vertex.getId().toString() + &= quot;, error:" + error);
return new Text(line);

I hope I didn't make it more complicated :)=A0
Cheers,



--
JU Han

Software Engineer Intern @ KXEN Inc.=
UTC=A0=A0 - =A0Universit=E9 de Technologie de Compi=E8gne
=A0=A0=A0=A0 GI0= 6 - Fouille de Donn=E9es et D=E9cisionnel




--
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
--047d7b15a385f0a65904dcc4a146--