giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Harenberg <sdhar...@ncsu.edu>
Subject Re: Input format problems running Giraph 1.1.0 on Twitter dataset
Date Wed, 29 Apr 2015 15:24:16 GMT
Hey Kenrick,

First, your commands above are wrong since you are specifying adjacency
list format with the -vif argument and since I believe
*LongLongNullTextInputFormat
*refers to adjacency list format. However, even with the right commands
there will be issues and more things you need to do.

I did get it the edgelist input format to work by creating a
LongNullTextEdgeInputFormat.java file just like the
giraph-core/src/main/java/org/apache/giraph/io/formats/IntNullTextEdgeInputFormat.java
file, but with longs instead of ints (this also required creating a
LongPair class).

However, I would advise against using an edgelist input format in Giraph as
there are major underlying issues that I never figured out how to resolve.
Namely, for an edgelist format, Giraph only considers a vertex active in
the first superstep if it has an outgoing edge. This means that vertices
with only incoming edges won't be initialized with correct values during
things like PageRank, SSSP, or WCC and hence will output incorrect results.
(You can see my previous thread here:
http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E
)

The above issue can be avoided with adjacency list format by specifying the
vertex with no neighbors. For example, if vertex v has only incoming edges,
then you make sure there is a line with just v and no neighbors listed (
http://mail-archives.apache.org/mod_mbox/giraph-user/201408.mbox/%3C1409255770206.93691@uiowa.edu%3E
).

If you figure out how to resolve the edgelist input issue please let me
know.

Regards,
Steve


On Sat, Apr 25, 2015 at 9:54 PM, Kenrick Fernandes <kenrick.f15@gmail.com>
wrote:

> Hi Roman,
>
> Thanks for the quick response. There is no vertex data in this
> dataset though, and the vertex IDs posted above would fit in a
> Long. Would you advise changing the PageRankComputation
> formats, or working on a new input format ?
>
> Thanks,
> Kenrick
>
> On Sat, Apr 25, 2015 at 7:40 PM, Roman Shaposhnik <roman@shaposhnik.org>
> wrote:
>
>> One of the slightly annoying things in Giraph is that you have
>> to manually match your input format to your computation. In
>> your case, PageRankComputation requires LongWritable for
>> vertex ID and DoubleWritable for vertex Data. You may need
>> to hack one of the existing formats slightly.
>>
>>
>> Thanks,
>> Roman.
>>
>> On Sat, Apr 25, 2015 at 2:58 PM, Kenrick Fernandes
>> <kenrick.f15@gmail.com> wrote:
>> > Hello,
>> >
>> > Im trying to get Giraph to read the Twitter dataset as input for the
>> > SimplePageRankComputation program. The dataset format looks like this:
>> > 61578010 61147436
>> > 61578037 61147436
>> > 61578040 61147436
>> > (vertex id's, with pairs representing edges)
>> >
>> > When I run the command with
>> > -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat, I get this
>> > error :
>> > java.lang.IllegalArgumentException: checkClassTypes: vertex index types
>> not
>> > assignable, computation - class org.apache.hadoop.io.LongWritable,
>> > VertexInputFormat - class org.apache.hadoop.io.IntWritable
>> >
>> > So I tried running the command with
>> > -vif org.apache.giraph.io.formats.LongLongNullTextInputFormat and I get
>> a
>> > different one:
>> > java.lang.IllegalArgumentException: checkClassTypes: vertex value types
>> not
>> > assignable, computation - class org.apache.hadoop.io.DoubleWritable,
>> > VertexInputFormat - class org.apache.hadoop.io.LongWritable
>> >
>> > I dont understand why the types in the input show up as different
>> formats in
>> > each error. Also, as far as I could tell, there is no input format for
>> > DoubleDouble. Is there a different way to get the graph into Giraph
>> without
>> > having to write custom input code ? Thoughts would be much appreciated.
>> >
>> > -----
>> > Reference Command:
>> > hadoop jar
>> giraph-examples-1.1.0-for-hadoop-1.1.2-jar-with-dependencies.jar
>> > org.apache.giraph.GiraphRunner
>> > org.apache.giraph.examples.PageRankComputation -vif
>> > org.apache.giraph.io.formats.LongLongNullTextInputFormat -vip
>> > /user/kenrick/twitter/input -op /user/kenrick/twitter/output -w 30
>> > -----
>> >
>> > Thanks,
>> > Kenrick
>>
>
>

Mime
View raw message