incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aapo Kyrola <akyr...@cs.cmu.edu>
Subject Re: Duplicate vertices?
Date Sat, 01 Oct 2011 22:59:49 GMT
Hi Avery, 

can you elaborate bit?

So I load vertices in order, but with skipping:

so partition 0 will read vertex 0, vertex 3, 6, …
partition 1 will read vertex 1, vertex,4, …

Do you mean the vertices must be consequtive in the 
split?

Aapo



On Oct 1, 2011, at 6:57 PM, Avery Ching wrote:

> Unfortunately, someone (probably me), needs to make a wiki on this issue.  Currently,
we require that your vertices are globally sorted by vertex id and that the vertices read
in each input split are in order by vertex id.  That probably explains the weirdness you are
seeing.  This issue is being addressed (albeit slowly because of new job) in https://issues.apache.org/jira/browse/GIRAPH-11.
 The issue is also described a bit more fully there.
> 
> Avery
> 
> On 10/1/11 12:44 PM, Aapo Kyrola wrote:
>> 
>> 
>> Hi,
>> 
>> I have a very difficult problem to debug. Several vertices seem to be duplicated
-
>> maybe I am not reading the inputs properly? Here is more info:
>> 
>> - I have three input splits and use three workers. I have written my own input-dataformat
>> (part of the zip I sent few days ago). In split one, i have ids mod 3 = 0, then ids
mod 3 = 1 etc.
>> 
>> I added some extra debug vertex id 875600:
>> 
>> - I checked that the vertex 875600 is read only once, with 8 edges by adding a System.out.println
debug:
>>  ::: READ: 875600 ; 8 : [81066, 271870, 272882, 483962, 621946, 723717, 834555, 845506]
>> 
>> - in the vertex.compute I will write the hostname of the computer and how many messsages,
and
>> eedges there are. From here I see that this vertex appear on two different hosts
because I get 
>> two types of outputs:
>> 
>> hostA.ml.cmu.edu 875600* => 0.0 / 0.0 msgs=0/6813839/8
>> 
>> hostB.ml.cmu.edu 875600* => -3.4657359027997265 / -3.4657359027997265 msgs=5/6813839/0
>> 
>> 
>> Note that the last string the debug is num-of-messages/num-edges/num-out-edges.
>> 
>> In the hostB, this vertex has no edges, but on host A, it has the correct 8 edges.
>> 
>> --
>> 
>> Does it matter how I split the vertex-ids?
>> 
>> 
>> 
>> ps. For next report I will make an Apache account. Too busy now..
>> 
>> 
>> Aapo Kyrola
>> Ph.D. student, http://www.cs.cmu.edu/~akyrola
>> 
> 

Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola


Mime
View raw message