giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aapo Kyrola <akyr...@cs.cmu.edu>
Subject Duplicate vertices?
Date Sat, 01 Oct 2011 19:44:08 GMT

Hi,

I have a very difficult problem to debug. Several vertices seem to be duplicated -
maybe I am not reading the inputs properly? Here is more info:

- I have three input splits and use three workers. I have written my own input-dataformat
(part of the zip I sent few days ago). In split one, i have ids mod 3 = 0, then ids mod 3
= 1 etc.

I added some extra debug vertex id 875600:

- I checked that the vertex 875600 is read only once, with 8 edges by adding a System.out.println
debug:
	::: READ: 875600 ; 8 : [81066, 271870, 272882, 483962, 621946, 723717, 834555, 845506]

- in the vertex.compute I will write the hostname of the computer and how many messsages,
and
eedges there are. From here I see that this vertex appear on two different hosts because I
get 
two types of outputs:

hostA.ml.cmu.edu 875600* => 0.0 / 0.0 msgs=0/6813839/8

hostB.ml.cmu.edu 875600* => -3.4657359027997265 / -3.4657359027997265 msgs=5/6813839/0


Note that the last string the debug is num-of-messages/num-edges/num-out-edges.

In the hostB, this vertex has no edges, but on host A, it has the correct 8 edges.

--

Does it matter how I split the vertex-ids?



ps. For next report I will make an Apache account. Too busy now..


Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola


Mime
View raw message