Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3F12DB59 for ; Wed, 8 Aug 2012 06:33:12 +0000 (UTC) Received: (qmail 35207 invoked by uid 500); 8 Aug 2012 06:33:12 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 35047 invoked by uid 500); 8 Aug 2012 06:33:11 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 35017 invoked by uid 500); 8 Aug 2012 06:33:11 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 34937 invoked by uid 99); 8 Aug 2012 06:33:10 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 06:33:10 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 7157F142856 for ; Wed, 8 Aug 2012 06:33:10 +0000 (UTC) Date: Wed, 8 Aug 2012 06:33:10 +0000 (UTC) From: "Jaeho Shin (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <938696553.3249.1344407590466.JavaMail.jiratomcat@issues-vm> In-Reply-To: <4192876.36357.1342025854696.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (GIRAPH-246) Periodic worker calls to context.progress() will prevent timeout on some Hadoop clusters during barrier waits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430904#comment-13430904 ] Jaeho Shin commented on GIRAPH-246: ----------------------------------- The reason I used Progressable instead of Context inside PredicateLock was mainly because it was a non-static inner-class of Mapper which is hard, if not impossible, to separate and use as a type for field in PredicateLock. It was also going to complicate the test code a lot, so I tried to minimize the coupling and the change as well. We do pass the Context object to every PredicateLock masking it with Progressable, so there can't be any leaking progress() calls. > Periodic worker calls to context.progress() will prevent timeout on some Hadoop clusters during barrier waits > ------------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-246 > URL: https://issues.apache.org/jira/browse/GIRAPH-246 > Project: Giraph > Issue Type: Improvement > Components: bsp > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Priority: Minor > Labels: hadoop, patch > Fix For: 0.2.0 > > Attachments: GIRAPH-246-1.patch, GIRAPH-246-2.patch, GIRAPH-246-3.patch, GIRAPH-246-4.patch, GIRAPH-246-5.patch, GIRAPH-246-6.patch, GIRAPH-246-7.patch, GIRAPH-246-8.patch, GIRAPH-246-9.patch > > > This simple change creates a command-line configurable option in GiraphJob to control the time between calls to context().progress() that allows workers to avoid timeouts during long data load-ins in which some works complete their input split reads much faster than others, or finish a super step faster. I found this allowed jobs that were large-scale but with low memory overhead to complete even when they would previously time out during runs on a Hadoop cluster. Timeout is still possible when the worker crashes or runs out of memory or has other GC or RPC trouble that is legitimate, but prevents unintentional crashes when the worker is actually still healthy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira