Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8523ECC97 for ; Tue, 14 Aug 2012 07:33:24 +0000 (UTC) Received: (qmail 90336 invoked by uid 500); 14 Aug 2012 07:33:24 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 90304 invoked by uid 500); 14 Aug 2012 07:33:23 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Delivered-To: moderator for dev@giraph.apache.org Received: (qmail 87692 invoked by uid 99); 14 Aug 2012 07:32:07 -0000 Content-Type: multipart/alternative; boundary="===============1071496245273213600==" MIME-Version: 1.0 Subject: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning From: "Avery Ching" To: "Avery Ching" , "giraph" Date: Tue, 14 Aug 2012 07:32:06 -0000 Message-ID: <20120814073206.23840.65341@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Avery Ching" X-ReviewGroup: giraph X-ReviewRequest-URL: https://reviews.apache.org/r/6600/ X-Sender: "Avery Ching" Reply-To: "Avery Ching" --===============1071496245273213600== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6600/ ----------------------------------------------------------- Review request for giraph. Description ------- * Upgrade to the most recent stable version of Netty (3.5.3.Final) * Try multiple connection attempts up to n failures * Track requests throughout the system by keeping track of the request id a= nd then matching the request id to the response (minor refactoring of Writa= bleRequest to make requests simpler and support the request id) * Improved handling of netty exceptions by dumping the exception stack to h= elp debug failures * Fixes bug in HashWorkerPartitioner by making partitionList thread-safe (t= his causes divide by zero exceptions in real life) This addresses bug GIRAPH-300. https://issues.apache.org/jira/browse/GIRAPH-300 Diffs ----- http://svn.apache.org/repos/asf/giraph/trunk/pom.xml 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/NettyClient.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/NettyServer.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/NettyWorkerClient.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/RequestInfo.java PRE-CREATION = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/RequestServerHandler.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/ResponseClientHandler.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/SendPartitionMessagesRequest.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/SendPartitionMutationsRequest.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/SendVertexRequest.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/comm/WritableRequest.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/graph/BspServiceMaster.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/graph/GiraphJob.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/graph/partition/HashWorkerPartitioner.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/utils/TimedLogger.java 1372575 = http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/gir= aph/comm/ConnectionTest.java 1372575 = Diff: https://reviews.apache.org/r/6600/diff/ Testing ------- Currently, netty connection failures causes issues with more than 75 worker= s in my setup. This allows us to reach over 200+ in a reasonably reliable n= etwork that doesn't kill connections. This code passes the local Hadoop regressions and the single node Hadoop in= stance regressions. It also succeeded on large runs (200+ workers) on a rea= l Hadoop cluster. Thanks, Avery Ching --===============1071496245273213600==--