Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E1D88DB58 for ; Fri, 17 Aug 2012 08:19:40 +0000 (UTC) Received: (qmail 95854 invoked by uid 500); 17 Aug 2012 08:19:40 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 95826 invoked by uid 500); 17 Aug 2012 08:19:40 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Delivered-To: moderator for dev@giraph.apache.org Received: (qmail 91473 invoked by uid 99); 17 Aug 2012 08:18:47 -0000 Content-Type: multipart/alternative; boundary="===============5865048320998582783==" MIME-Version: 1.0 Subject: Review Request: GIRAPH-302: Thread safety issue with sending partitions around From: "Avery Ching" To: "Avery Ching" , "giraph" Date: Fri, 17 Aug 2012 08:18:45 -0000 Message-ID: <20120817081845.23840.80780@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Avery Ching" X-ReviewGroup: giraph X-ReviewRequest-URL: https://reviews.apache.org/r/6676/ X-Sender: "Avery Ching" Reply-To: "Avery Ching" --===============5865048320998582783== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6676/ ----------------------------------------------------------- Review request for giraph. Description ------- When calling sendPartitionRequest(), we clear the vertex list afterward, ma= king it a race! I noticed this when I was running with 300 workers and the number of edges = wasn't what I expected. Sometimes we get empty requests! After digging into the code I found the issue and have fixed it. Giraph Stats Aggregate edges 99,971,220 0 99,971,220 Superstep 11 0 11 Current workers 300 0 300 Last checkpointed superstep 0 0 0 Current master task partition 0 0 0 Sent messages 0 0 0 Aggregate finished vertices 10,000,000 0 10,000,000 Aggregate vertices 10,000,000 0 10,000,000 This is wrong! Giraph Stats Aggregate edges 100,000,000 0 100,000,000 Superstep 11 0 11 Last checkpointed superstep 0 0 0 Current workers 300 0 300 Current master task partition 0 0 0 Sent messages 0 0 0 Aggregate finished vertices 10,000,000 0 10,000,000 Aggregate vertices 10,000,000 0 10,000,000 Fixed! Also added a few messages for better debugging. This addresses bug GIRAPH-302. https://issues.apache.org/jira/browse/GIRAPH-302 Diffs ----- http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/gir= aph/graph/BspServiceWorker.java 1373682 = Diff: https://reviews.apache.org/r/6676/diff/ Testing ------- Passed unittests and verified on a real cluster using 300 machines. Thanks, Avery Ching --===============5865048320998582783==--