incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph
Date Mon, 14 Nov 2011 06:56:19 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
-----------------------------------------------------------

(Updated 2011-11-14 06:56:19.251685)


Review request for giraph.


Changes
-------

Updated the diff as per Hyunsik's request to build against recent trunk changes.  While I
was waiting I added some fixed and additions as well.

Upgrade ZooKeeper to 3.3.3 from 3.3.1.

Fixed bug in PseudoRandomVertexInputFormat.java where the edges are not fully added (hasEdge
is not the right place to look for the edge).

Fixed bug in BasicRPCCommunications when putting to a local inPartitionMap

Added counter for last checkpointed superstep

Master should refresh the progress every 60 seconds while waiting for workers to ensure that
the job isn't killed

Fixed bugs in vertexCounter, finishedVertexCoutner, edgeCounter, and sentMessages counter
not resetting every update (just cumultatively being added).

Add additional helpful status messages for debugging.

Turned off speculative execution for Giraph (not a good idea).

Added analysis of the partition balancing for debugging


Summary
-------

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
    https://issues.apache.org/jira/browse/GIRAPH-11


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java
1186590 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java
1201607 

Diff: https://reviews.apache.org/r/2788/diff


Testing
-------

local and MR unittests.  Added some simple unittests for testing the out-of-order input splits
and other balancing algorithms.


Thanks,

Avery


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message