Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 77586 invoked from network); 13 Mar 2009 13:51:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2009 13:51:38 -0000 Received: (qmail 25693 invoked by uid 500); 13 Mar 2009 13:51:30 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 25653 invoked by uid 500); 13 Mar 2009 13:51:30 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 25642 invoked by uid 99); 13 Mar 2009 13:51:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 06:51:30 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 13:51:21 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1Li7mv-0001DH-5b for core-user@hadoop.apache.org; Fri, 13 Mar 2009 06:51:01 -0700 Message-ID: <22496810.post@talk.nabble.com> Date: Fri, 13 Mar 2009 06:51:01 -0700 (PDT) From: Doug Cook To: core-user@hadoop.apache.org Subject: Reduce task going away for 10 seconds at a time MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: nabble@candiru.com X-Virus-Checked: Checked by ClamAV on apache.org Hi folks, I've been debugging a severe performance problems with a Hadoop-based application (a highly modified version of Nutch). I've recently upgraded to Hadoop 0.19.1 from a much, much older version, and a reduce that used to work just fine is now running orders of magnitude more slowly. >From the logs I can see that progress of my reduce stops for periods that average almost exactly 10 seconds (with a very narrow distribution around 10 seconds), and it does so in various places in my code, but more or less in proportion to how much time I'd expect the task would normally spend in that particular place in the code, i.e. the behavior seems like my code is randomly being interrupted for 10 seconds at a time. I'm planning to keep digging, but thought that these symptoms might sound familiar to someone on this list. Ring any bells? Your help much appreciated. Thanks! Doug Cook -- View this message in context: http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html Sent from the Hadoop core-user mailing list archive at Nabble.com.