giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ghufran malik <ghufran1ma...@gmail.com>
Subject Running ConnectedComponents in a cluster.
Date Wed, 16 Apr 2014 16:21:48 GMT
Hi,

I have setup Giraph on my university cluster of computers (Giraph
1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the
connected components algorithm on a very small test dataset using 4 workers
and it produced the expected output.

dataset:

vertex id, vertex value, neighbours....

0 0 1
1 1 0 2 3
2 2 1 3
3 3 1 2

output:
1    0
0    0
3    0
2    0

However when I tried to run this algorithm on a larger dataset (reformatted
version of com-youtube.ungraph from Stanford snap to match the
IntIntNullTextVertexInputFormat) it successfully complets but the incorrect
output is produced. It seems to just output the vertex id with its orignal
value (its vertex id is its original value that i set).

A snippet of the dataset is provided:

vertex id, vertex value, neighbours....
.......
278447 278447 532613
278449 278449 305447 324115 414238
83899 83899 153460 172614 176613 211448
773749 773749 845366
773748 773748 960388
.......
output produced:
.............
73132    73132
831308    831308
199788    199788
763644    763644
300572    300572
.............
there's not one vertex value that isn't the same as its original vertex ID.

The computation also stops after superstep 0 is done and goes no further,
whereas on my smaller data set completes 3 supersteps.

Does anyone have an idea to why this is?

Kind regards,

Ghufran

Mime
View raw message