giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenrick Fernandes <kenrick....@gmail.com>
Subject Re: Input format problems running Giraph 1.1.0 on Twitter dataset
Date Sun, 26 Apr 2015 01:54:19 GMT
Hi Roman,

Thanks for the quick response. There is no vertex data in this
dataset though, and the vertex IDs posted above would fit in a
Long. Would you advise changing the PageRankComputation
formats, or working on a new input format ?

Thanks,
Kenrick

On Sat, Apr 25, 2015 at 7:40 PM, Roman Shaposhnik <roman@shaposhnik.org>
wrote:

> One of the slightly annoying things in Giraph is that you have
> to manually match your input format to your computation. In
> your case, PageRankComputation requires LongWritable for
> vertex ID and DoubleWritable for vertex Data. You may need
> to hack one of the existing formats slightly.
>
>
> Thanks,
> Roman.
>
> On Sat, Apr 25, 2015 at 2:58 PM, Kenrick Fernandes
> <kenrick.f15@gmail.com> wrote:
> > Hello,
> >
> > Im trying to get Giraph to read the Twitter dataset as input for the
> > SimplePageRankComputation program. The dataset format looks like this:
> > 61578010 61147436
> > 61578037 61147436
> > 61578040 61147436
> > (vertex id's, with pairs representing edges)
> >
> > When I run the command with
> > -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat, I get this
> > error :
> > java.lang.IllegalArgumentException: checkClassTypes: vertex index types
> not
> > assignable, computation - class org.apache.hadoop.io.LongWritable,
> > VertexInputFormat - class org.apache.hadoop.io.IntWritable
> >
> > So I tried running the command with
> > -vif org.apache.giraph.io.formats.LongLongNullTextInputFormat and I get a
> > different one:
> > java.lang.IllegalArgumentException: checkClassTypes: vertex value types
> not
> > assignable, computation - class org.apache.hadoop.io.DoubleWritable,
> > VertexInputFormat - class org.apache.hadoop.io.LongWritable
> >
> > I dont understand why the types in the input show up as different
> formats in
> > each error. Also, as far as I could tell, there is no input format for
> > DoubleDouble. Is there a different way to get the graph into Giraph
> without
> > having to write custom input code ? Thoughts would be much appreciated.
> >
> > -----
> > Reference Command:
> > hadoop jar
> giraph-examples-1.1.0-for-hadoop-1.1.2-jar-with-dependencies.jar
> > org.apache.giraph.GiraphRunner
> > org.apache.giraph.examples.PageRankComputation -vif
> > org.apache.giraph.io.formats.LongLongNullTextInputFormat -vip
> > /user/kenrick/twitter/input -op /user/kenrick/twitter/output -w 30
> > -----
> >
> > Thanks,
> > Kenrick
>

Mime
View raw message