giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Harenberg <sdhar...@ncsu.edu>
Subject Re: How to format Giraph input dataset
Date Fri, 13 Mar 2015 13:54:35 GMT
Hi Ralph,

I also wanted to use edge-list input format as well since I am running
examples from SNAP. I ran into a lot of issues and at this point if I could
go back in time I would probably just make a script to convert the graphs
into giraphs standard format.

To deal with the type of errors you had above, I created my own class files:

   - LongFloatTextEdgeInputFormat.java (for pagerank)
   - LongNullTextEdgeInputFormat.java
   - LongNullReverseTextEdgeInputFormat.java (for undirected)
   - LongPair (used inside the above classes)

Basically, these just were the same as their corresponding int class file.

However, the main issue with edgelist input files, there is a fundamental
issue with SSSP (and I believe pagerank) when using an edgelist input
format. If a vertex is not ever listed first in an edge (e.g., it only has
incoming edges), it will not be "active" in superstep 0. This means it will
not be initialized with the correct value (
http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E
).

On Thu, Mar 12, 2015 at 11:04 AM, MengXiaodong <mengxiaodong1985@gmail.com>
wrote:

> Hi Martin,
>
> Thank you for your kindly reply. I followed your suggestion and input the
> command like blow:
>
> *hadoop
> jar giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsComputation
> -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat -eip
> /WikiTalk.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
> -op /outputTran -w 1*
>
> However, I got a error when I try this common:
> *Exception in thread "main" java.lang.IllegalArgumentException:
> checkClassTypes: vertex index types not assignable, computation - class
> org.apache.hadoop.io.LongWritable, EdgeInputFormat - class
> org.apache.hadoop.io.NullWritable*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.checkAssignable(GiraphConfigurationValidator.java:384)*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.verifyEdgeInputFormatGenericTypes(GiraphConfigurationValidator.java:242)*
> * at
> org.apache.giraph.job.GiraphConfigurationValidator.validateConfiguration(GiraphConfigurationValidator.java:142)*
> * at
> org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:222)*
> * at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)*
> * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
> * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)*
> * at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)*
> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> * at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
> * at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> * at java.lang.reflect.Method.invoke(Method.java:483)*
> * at org.apache.hadoop.util.RunJar.main(RunJar.java:156)*
>
>
>
> I assume that the error happens because the input format is intwritable
> while the example uses longwritable as the vertex id. If so, may I ask how
> to transfer intwritable to longwritable?
>
> Kindly Regards,
> Ralph
>
> On Mar 11, 2015, at 4:02 PM, Martin Junghanns <martin.junghanns@gmx.net>
> wrote:
>
> Hi Ralph,
>
> you can set a vertex or edge input format when running a Giraph job.
> In the example, you used the vertex input format (vif)
>
> "-vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat"
>
> Your wikitalk input format is an edge list and Giraph offers, e.g.,
>
> "org.apache.giraph.io.formats.IntNullTextEdgeInputFormat"
>
> which reads a graph where "Each line consists of: source_vertex,
> target_vertex" (separated by a \t)
>
> You can set the edge input format via the -eif parameter.
>
> Cheers,
> Martin
>
> The package "org.apache.giraph.io.formats" in giraph-core contains a lot
> more formats.
>
> On 11.03.2015 06:37, MengXiaodong wrote:
>
> Hi all,
>
> I'm new to Giraph, now I successfully ran my first example by
> following the instruction on Giraph - Quick Start. However, I met a
> question when I write my own Giraph code.
>
> In the "quick start", The format of input graph is as following:
>
> [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]]
> [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
>
> But the graphs (like Facebook, twitter social network) datasets
> downloaded from public websites are in various format. How can I
> transform a graph into the standard Giraph graph like the above
> one?
>
> For example the WikiTalk graph as blow, which is a directed graph.
> Directed edge A->B means user A edited talk page of B.
>
> # FromNodeId ToNodeId 0 1 2 1 2 21 2 46 2 63 2 88 2 93 2 94 2 101 2
> 102 2 103 2 116 2 119 2 125
>
> Regards, Ralph
>
>
>

Mime
View raw message