Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD34717826 for ; Fri, 13 Mar 2015 13:56:50 +0000 (UTC) Received: (qmail 33596 invoked by uid 500); 13 Mar 2015 13:56:44 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 33549 invoked by uid 500); 13 Mar 2015 13:56:44 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 33539 invoked by uid 99); 13 Mar 2015 13:56:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2015 13:56:44 +0000 X-ASF-Spam-Status: No, hits=0.2 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_MED X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [74.125.149.250] (HELO na3sys009aog132.obsmtp.com) (74.125.149.250) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2015 13:56:36 +0000 Received: from mail-wi0-f169.google.com ([209.85.212.169]) (using TLSv1) by na3sys009aob132.postini.com ([74.125.148.12]) with SMTP ID DSNKVQLsMsVMTet2eSv1bClXWmmmn8JjTu8V@postini.com; Fri, 13 Mar 2015 06:56:15 PDT Received: by wiwh11 with SMTP id h11so6166058wiw.5 for ; Fri, 13 Mar 2015 06:54:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=6xEVKIlXvnSdYzN/sGz3xqEXzOp9dhZ0bzRUH2OHOes=; b=RK80FzJjpm91RDBpTaNcSckvTTkNT+Mro5tkiw2U2p/aqF+opMM9r0H46BxPFq/J9c 4/uC8fIsLnzgMzCASfU3ikFP6uqSqOoaCY6zju0ckSN7ie7Kn9w0kp7BdYLCsEZbopok hb4wGEIw8NfpJ3fLB2TBDmPAinlfVgkH9p4zsUeSgJ+A12YUncWdcXJlMVFZ/KuLzs46 zeuAneah+/VKANS6Ss7IrLmMytUamrJwSTt8oqPM1nEH2fJM//Zx3OjED6lwD+mtVNts pjlOfIOJCMxMDR/LV0cMG+mklQUAfiV5zc8QgwoRqWkQIJFPgGaJGkKITMfSAy1z/mIx 1Ziw== X-Gm-Message-State: ALoCoQmiqIZSq7BWMK186hw/zoNLQQZLRskVnyMvQfWTll7sfjpstdkW6CZMAOTCiLayzEV2xHdgmVzHuJYoNa0f5LYYY6/JBPZsoQyB8AuEXW3ivguvkXFuoJMtQC8fwmbGDwFiAhuAaGSSxqHKRfHY49HjxZoFiA== X-Received: by 10.194.177.195 with SMTP id cs3mr95018615wjc.141.1426254895786; Fri, 13 Mar 2015 06:54:55 -0700 (PDT) X-Received: by 10.194.177.195 with SMTP id cs3mr95018594wjc.141.1426254895634; Fri, 13 Mar 2015 06:54:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.21.193 with HTTP; Fri, 13 Mar 2015 06:54:35 -0700 (PDT) In-Reply-To: References: <0138C871-ABF2-4AE9-A7F9-78B5DE7ADB8C@gmail.com> <54FFE88C.1040507@gmx.net> From: Steven Harenberg Date: Fri, 13 Mar 2015 09:54:35 -0400 Message-ID: Subject: Re: How to format Giraph input dataset To: user Content-Type: multipart/alternative; boundary=089e013d0a3c69ecd105112bd814 X-Virus-Checked: Checked by ClamAV on apache.org --089e013d0a3c69ecd105112bd814 Content-Type: text/plain; charset=UTF-8 Hi Ralph, I also wanted to use edge-list input format as well since I am running examples from SNAP. I ran into a lot of issues and at this point if I could go back in time I would probably just make a script to convert the graphs into giraphs standard format. To deal with the type of errors you had above, I created my own class files: - LongFloatTextEdgeInputFormat.java (for pagerank) - LongNullTextEdgeInputFormat.java - LongNullReverseTextEdgeInputFormat.java (for undirected) - LongPair (used inside the above classes) Basically, these just were the same as their corresponding int class file. However, the main issue with edgelist input files, there is a fundamental issue with SSSP (and I believe pagerank) when using an edgelist input format. If a vertex is not ever listed first in an edge (e.g., it only has incoming edges), it will not be "active" in superstep 0. This means it will not be initialized with the correct value ( http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E ). On Thu, Mar 12, 2015 at 11:04 AM, MengXiaodong wrote: > Hi Martin, > > Thank you for your kindly reply. I followed your suggestion and input the > command like blow: > > *hadoop > jar giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsComputation > -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip > /WikiTalk.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat > -op /outputTran -w 1* > > However, I got a error when I try this common: > *Exception in thread "main" java.lang.IllegalArgumentException: > checkClassTypes: vertex index types not assignable, computation - class > org.apache.hadoop.io.LongWritable, EdgeInputFormat - class > org.apache.hadoop.io.NullWritable* > * at > org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384)* > * at > org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242)* > * at > org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142)* > * at > org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222)* > * at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)* > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)* > * at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)* > * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* > * at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)* > * at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* > * at java.lang.reflect.Method.invoke(Method.java:483)* > * at org.apache.hadoop.util.RunJar.main(RunJar.java:156)* > > > > I assume that the error happens because the input format is intwritable > while the example uses longwritable as the vertex id. If so, may I ask how > to transfer intwritable to longwritable? > > Kindly Regards, > Ralph > > On Mar 11, 2015, at 4:02 PM, Martin Junghanns > wrote: > > Hi Ralph, > > you can set a vertex or edge input format when running a Giraph job. > In the example, you used the vertex input format (vif) > > "-vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat" > > Your wikitalk input format is an edge list and Giraph offers, e.g., > > "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat" > > which reads a graph where "Each line consists of: source_vertex, > target_vertex" (separated by a \t) > > You can set the edge input format via the -eif parameter. > > Cheers, > Martin > > The package "org.apache.giraph.io.formats" in giraph-core contains a lot > more formats. > > On 11.03.2015 06:37, MengXiaodong wrote: > > Hi all, > > I'm new to Giraph, now I successfully ran my first example by > following the instruction on Giraph - Quick Start. However, I met a > question when I write my own Giraph code. > > In the "quick start", The format of input graph is as following: > > [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]] > > But the graphs (like Facebook, twitter social network) datasets > downloaded from public websites are in various format. How can I > transform a graph into the standard Giraph graph like the above > one? > > For example the WikiTalk graph as blow, which is a directed graph. > Directed edge A->B means user A edited talk page of B. > > # FromNodeId ToNodeId 0 1 2 1 2 21 2 46 2 63 2 88 2 93 2 94 2 101 2 > 102 2 103 2 116 2 119 2 125 > > Regards, Ralph > > > --089e013d0a3c69ecd105112bd814 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Ralph,

I also wanted to use edge= -list input format as well since I am running examples from SNAP. I ran int= o a lot of issues and at this point if I could go back in time I would prob= ably just make a script to convert the graphs into giraphs standard format.=

To deal with the type of errors you had above, I created my o= wn class files:
  • LongFloatTextEdgeInputFormat.java (for pag= erank)
  • LongNullTextEdgeInputFormat.java
  • LongNullReverseTextEdgeInputFormat.java (for undire= cted)
  • LongPair (used inside the above classes)<= /li>

Basically, these just were the same as their corresponding int = class file.

However, the main issue with edgelist input files, th= ere is a fundamental issue with SSSP (and I believe pagerank) when using an= edgelist input format. If a vertex is not ever listed first in an edge (e.= g., it only has incoming edges), it will not be "active" in super= step 0. This means it will not be initialized with the correct value (http://mail-archives.apache.org/mod_mbox/giraph-user/20150= 2.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmai= l.com%3E).


On Thu, Mar 12, 2015 at 11:04 AM, MengXiaodong <<= a href=3D"mailto:mengxiaodong1985@gmail.com" target=3D"_blank">mengxiaodong= 1985@gmail.com> wrote:
Hi Martin,

Thank you for= your kindly reply. I followed your suggestion and input the command like b= low:

hadoop jar=C2=A0giraph-examples/target/gira= ph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar<= /span>=C2=A0org.apache.giraph.GiraphRunner org.apache.giraph.examples.S= impleShortestPathsComputation -eif=C2=A0org.apache.giraph.io.for= mats.IntNullTextEdgeInputFormat=C2=A0-eip /WikiTalk.t= xt=C2=A0-vof org.apache.giraph.io.formats.IdWithValueTextOutputF= ormat -op=C2=A0/outputTran -w 1

However, I got a error when I try this=C2=A0common:
Exception in thread "main" java.lang.Illega= lArgumentException: checkClassTypes: vertex index types not assignable, com= putation - class org.apache.hadoop.io.LongWritable, EdgeInputFormat - class= org.apache.hadoop.io.NullWritable
at org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(Gira= phConfigurationValidator.java:384)
at org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputForma= tGenericTypes(GiraphConfigurationValidator.java:242)
at org.apache.giraph.job.GiraphConfigurationValidator.va= lidateConfiguration(GiraphConfigurationValidator.java:142)
at org.apache.giraph.utils.ConfigurationUtils.pars= eArgs(ConfigurationUtils.java:222)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRun= ner.java:65)
at org.apache.hadoop.= util.ToolRunner.run(ToolRunner.java:79)
<= /span>at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.i= nvoke0(Native Method)
at sun.refle= ct.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorI= mpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(Run= Jar.java:156)



<= div>I assume that the error happens because the input format is intwritable= while the example uses longwritable as the vertex id. If so, may I ask how= to transfer intwritable to longwritable?

Kindly R= egards,
Ralph

On Mar 11, 2015, at 4:02 PM, Martin Junghanns <martin.junghanns@gmx.ne= t> wrote:

Hi Ralph,

you can set a vertex or edg= e input format when running a Giraph job.
In the example, you used the v= ertex input format (vif)

"-vif
org.apache.giraph.io.formats.= JsonLongDoubleFloatDoubleVertexInputFormat"

Your wikitalk input= format is an edge list and Giraph offers, e.g.,

"org.apache.gi= raph.io.formats.IntNullTextEdgeInputFormat"

which reads a graph= where "Each line consists of: source_vertex,
target_vertex" (= separated by a \t)

You can set the edge input format via the -eif pa= rameter.

Cheers,
Martin

The package "org.apache.girap= h.io.formats" in giraph-core contains a lot
more formats.

On= 11.03.2015 06:37, MengXiaodong wrote:
Hi all,=

I'm new to Giraph, now I successfully ran my first example byfollowing the instruction on Giraph - Quick Start. However, I met a
qu= estion when I write my own Giraph code.

In the "quick start&quo= t;, The format of input graph is as following:

[0,0,[[1,1],[3,3]]] [= 1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]] = [4,0,[[3,4],[2,4]]]

But the graphs (like Facebook, twitter social ne= twork) datasets
downloaded from public websites are in various format. H= ow can I
transform a graph into the standard Giraph graph like the above=
one?

For example the WikiTalk graph as blow, which is a directed= graph.
Directed edge A->B means user A edited talk page of B.
# FromNodeId ToNodeId 0 1 2 1 2 21 2 46 2= 63 2 88 2 93 2 94 2 101 2
102 2 103 2 116 2 119 2 125

Regards, Ralph


=

--089e013d0a3c69ecd105112bd814--