giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Review Request: GIRAPH-302: Thread safety issue with sending partitions around
Date Fri, 17 Aug 2012 08:18:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6676/
-----------------------------------------------------------

Review request for giraph.


Description
-------

When calling sendPartitionRequest(), we clear the vertex list afterward, making it a race!

I noticed this when I was running with 300 workers and the number of edges wasn't what I expected.
Sometimes we get empty requests!

After digging into the code I found the issue and have fixed it.

Giraph Stats Aggregate edges 99,971,220 0 99,971,220
Superstep 11 0 11
Current workers 300 0 300
Last checkpointed superstep 0 0 0
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000

This is wrong!

Giraph Stats Aggregate edges 100,000,000 0 100,000,000
Superstep 11 0 11
Last checkpointed superstep 0 0 0
Current workers 300 0 300
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000

Fixed!

Also added a few messages for better debugging.


This addresses bug GIRAPH-302.
    https://issues.apache.org/jira/browse/GIRAPH-302


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1373682 

Diff: https://reviews.apache.org/r/6676/diff/


Testing
-------

Passed unittests and verified on a real cluster using 300 machines.


Thanks,

Avery Ching


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message