incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Duplicate vertices?
Date Sat, 01 Oct 2011 23:15:02 GMT
I mean that

input split 0 should have 0, 1, 7
input split 1 should have 13, 20
input split 2 should have 24, 87, 108

I think the more clear definition should be something like
1)  Any vertex read by the VertexReader should have a vertex id greater 
than its predecessor vertex read.
2)  If an input split A has a vertex with a vertex id < any vertex id in 
input split, then all vertex ids in input split A must be < all vertex 
ids in input split B.

Hope that helps.  Let me know if you have more questions.

Avery

On 10/1/11 3:59 PM, Aapo Kyrola wrote:
> Hi Avery,
>
> can you elaborate bit?
>
> So I load vertices in order, but with skipping:
>
> so partition 0 will read vertex 0, vertex 3, 6, …
> partition 1 will read vertex 1, vertex,4, …
>
> Do you mean the vertices must be consequtive in the
> split?
>
> Aapo
>
>
>
> On Oct 1, 2011, at 6:57 PM, Avery Ching wrote:
>
>> Unfortunately, someone (probably me), needs to make a wiki on this 
>> issue.  Currently, we require that your vertices are globally sorted 
>> by vertex id and that the vertices read in each input split are in 
>> order by vertex id.  That probably explains the weirdness you are 
>> seeing.  This issue is being addressed (albeit slowly because of new 
>> job) in https://issues.apache.org/jira/browse/GIRAPH-11.  The issue 
>> is also described a bit more fully there.
>>
>> Avery
>>
>> On 10/1/11 12:44 PM, Aapo Kyrola wrote:
>>>
>>> Hi,
>>>
>>> I have a very difficult problem to debug. Several vertices seem to 
>>> be duplicated -
>>> maybe I am not reading the inputs properly? Here is more info:
>>>
>>> - I have three input splits and use three workers. I have written my 
>>> own input-dataformat
>>> (part of the zip I sent few days ago). In split one, i have ids mod 
>>> 3 = 0, then ids mod 3 = 1 etc.
>>>
>>> I added some extra debug vertex id 875600:
>>>
>>> - I checked that the vertex 875600 is read only once, with 8 edges 
>>> by adding a System.out.println debug:
>>> ::: READ: 875600 ; 8 : [81066, 271870, 272882, 483962, 621946, 
>>> 723717, 834555, 845506]
>>>
>>> - in the vertex.compute I will write the hostname of the computer 
>>> and how many messsages, and
>>> eedges there are. From here I see that this vertex appear on two 
>>> different hosts because I get
>>> two types of outputs:
>>>
>>> hostA.ml.cmu.edu <http://hostA.ml.cmu.edu/> 875600* => 0.0 / 0.0 
>>> msgs=0/6813839/8
>>>
>>> hostB.ml.cmu.edu <http://hostB.ml.cmu.edu/> 875600* => 
>>> -3.4657359027997265 / -3.4657359027997265 msgs=5/6813839/0
>>>
>>>
>>> Note that the last string the debug is 
>>> num-of-messages/num-edges/num-out-edges.
>>>
>>> In the hostB, this vertex has no edges, but on host A, it has the 
>>> correct 8 edges.
>>>
>>> --
>>>
>>> Does it matter how I split the vertex-ids?
>>>
>>>
>>>
>>> ps. For next report I will make an Apache account. Too busy now..
>>>
>>>
>>> Aapo Kyrola
>>> Ph.D. student, http://www.cs.cmu.edu/~akyrola 
>>> <http://www.cs.cmu.edu/%7Eakyrola>
>>>
>>
>
> Aapo Kyrola
> Ph.D. student, http://www.cs.cmu.edu/~akyrola 
> <http://www.cs.cmu.edu/%7Eakyrola>
>


Mime
View raw message