Return-Path: X-Original-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94B3D9DFF for ; Mon, 14 Nov 2011 06:56:20 +0000 (UTC) Received: (qmail 44602 invoked by uid 500); 14 Nov 2011 06:56:20 -0000 Delivered-To: apmail-incubator-giraph-dev-archive@incubator.apache.org Received: (qmail 44560 invoked by uid 500); 14 Nov 2011 06:56:20 -0000 Mailing-List: contact giraph-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-dev@incubator.apache.org Delivered-To: mailing list giraph-dev@incubator.apache.org Received: (qmail 44546 invoked by uid 99); 14 Nov 2011 06:56:19 -0000 Received: from reviews.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 06:56:19 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 5D3021C0040; Mon, 14 Nov 2011 06:56:19 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============7096775182334982478==" MIME-Version: 1.0 Subject: Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph From: "Avery Ching" To: "Avery Ching" , "giraph" Date: Mon, 14 Nov 2011 06:56:19 -0000 Message-ID: <20111114065619.8166.82237@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org X-ReviewRequest-URL: https://reviews.apache.org/r/2788/ In-Reply-To: <20111109111816.27743.73419@reviews.apache.org> References: <20111109111816.27743.73419@reviews.apache.org> --===============7096775182334982478== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2788/ ----------------------------------------------------------- (Updated 2011-11-14 06:56:19.251685) Review request for giraph. Changes ------- Updated the diff as per Hyunsik's request to build against recent trunk cha= nges. While I was waiting I added some fixed and additions as well. Upgrade ZooKeeper to 3.3.3 from 3.3.1. Fixed bug in PseudoRandomVertexInputFormat.java where the edges are not ful= ly added (hasEdge is not the right place to look for the edge). Fixed bug in BasicRPCCommunications when putting to a local inPartitionMap Added counter for last checkpointed superstep Master should refresh the progress every 60 seconds while waiting for worke= rs to ensure that the job isn't killed Fixed bugs in vertexCounter, finishedVertexCoutner, edgeCounter, and sentMe= ssages counter not resetting every update (just cumultatively being added). Add additional helpful status messages for debugging. Turned off speculative execution for Giraph (not a good idea). Added analysis of the partition balancing for debugging Summary ------- Warning: This is a very large change! Vertex ranges no longer exist. A generic partitioner handles the division of vertex ids to partitions. As a default, there is a HashPartitioner and a HashRangePartitioner that will use the hashCode of a Java object to decide which partition to place the vertex. Developers can write their own algorithm to determine how to change the partitioning as well as implement the assignment of partitions to workers. All vertices loaded from the input split are sent to the owner of the partition rather than loaded locally. This eliminates the constraint that the vertices must be ordered in the input split. The checkpoint format has been changed to suit the new partition style. Checkpoints are now a lot simpler. The master will assign partitions and the workers will only load their own partitions from the checkpoint. Unfortunately, the vertex range implementation was baked into almost every aspect of the code (hence the ridiculous size of this diff). But now it should be flexible to support several different graph partitioning schemes (i.e. hash-based, hash-ranged-based, and for special cases, fully ranged-based). Sorry for the long delay, but this way pretty involved. This addresses bug GIRAPH-11. https://issues.apache.org/jira/browse/GIRAPH-11 Diffs (updated) ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/benchmark/PseudoRandomVertexInputFormat.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/bsp/CentralizedServiceWorker.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/comm/BasicRPCCommunications.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/comm/CommunicationsInterface.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/comm/RPCCommunications.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/comm/ServerInterface.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/comm/WorkerCommunications.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/GeneratedVertexInputFormat.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/GeneratedVertexReader.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/MaxAggregator.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/MinAggregator.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/SimpleMutateGraphVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/SimpleSuperstepVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/SuperstepBalancer.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/SuperstepHashPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/examples/VerifyMessage.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/AutoBalancer.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BasicVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BasicVertexRangeBalancer.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BspService.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BspServiceMaster.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BspServiceWorker.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/BspUtils.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/GiraphJob.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/GlobalStats.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/GraphMapper.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/GraphState.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/MutableVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/StaticBalancer.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/Vertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/VertexEdgeCount.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/VertexRange.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/VertexRangeBalancer.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/WorkerInfo.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/GraphPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/HashPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/HashRangePartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/Partition.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/PartitionStats.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/PartitionUtils.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangePartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/utils/WritableUtils.java PRE-CREATION = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/= apache/giraph/zk/ZooKeeperExt.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/= apache/giraph/TestMutateGraphVertex.java 1201607 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/= apache/giraph/TestVertexRangeBalancer.java 1186590 = http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/= apache/giraph/TestVertexRangeBalancer.java 1201607 = Diff: https://reviews.apache.org/r/2788/diff Testing ------- local and MR unittests. Added some simple unittests for testing the out-of= -order input splits and other balancing algorithms. Thanks, Avery --===============7096775182334982478==--