Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 92813 invoked from network); 9 Apr 2009 22:05:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Apr 2009 22:05:26 -0000 Received: (qmail 99417 invoked by uid 500); 9 Apr 2009 22:05:24 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 99344 invoked by uid 500); 9 Apr 2009 22:05:24 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 99334 invoked by uid 99); 9 Apr 2009 22:05:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Apr 2009 22:05:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.191.87.226] (HELO web39107.mail.mud.yahoo.com) (209.191.87.226) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 09 Apr 2009 22:05:15 +0000 Received: (qmail 18183 invoked by uid 60001); 9 Apr 2009 22:04:53 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239314693; bh=Hi/yG/sGVcJCX1Jv6ZYGZq21vMp75DFkot8cZAFwvug=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=HNq0lIlaREGriuXDOZtQyxQHZwkJMtyUEFRBiqid0hqXrTW6WVF05QPOZLjlpslMK6H2VUWIoVRsg0UkdZizvrXFwsrPKFcEZpcZDEKvD9fJPhKbLp6Jb/pcwaYzPyjv+dxhCp68MxbaGkz9pbxAVmv7kGg9ZvdP8eN1EZZyNLA= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=i30jGfPsnFfLdeeC9RNmHmrPMCS9ICXgeFykFd/u+9M102rZUVqb7uvyjz4LUxujlat1d+mJ+/VRszyxiOmVnfs7+LYM8oVFK7LSuMDTJSCGPIPn4mdQJRbhAFOf6fy5Ee+ojkz9hxUijqxrY6HqW6FfDDOyF8jc7ebAC73Nnoc=; Message-ID: <809876.16516.qm@web39107.mail.mud.yahoo.com> X-YMail-OSG: islf5DwVM1n0E5yeyv7Ez.gZtHwNj9q9Xw6rN2WXWb3nZXhL5I3hwxuRbNv2b5ejlFEyyZx.b1FwjnluofZ40EnMfCSK0HXJXHhEO3mXGCjitNNYqu9htwLPnZbmCF0DyVDdhARMVIyxScYa8U8KQxpzaqNCtXKTOBqE8Q_ilSZ_Gfv7r0GW1h1AsAYagtIdnfTnLSTma2pscxRAa9.t.N5SJlhYD7pzCmaXMhKeEoB8uUY_gyBYZ.JcHhCIl3Z.oozdlbgumq7LrjkRUgARePe6DLmc.8_WlvRpRiCJ2yIa9Rm0BA-- Received: from [64.172.17.3] by web39107.mail.mud.yahoo.com via HTTP; Thu, 09 Apr 2009 15:04:53 PDT X-Mailer: YahooMailClassic/5.2.15 YahooMailWebService/0.7.289.1 Date: Thu, 9 Apr 2009 15:04:53 -0700 (PDT) From: Steve Gao Subject: [Interesting] One reducer randomly hangs on getting 0 mapper output To: core-user@hadoop.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-577627439-1239314693=:16516" X-Virus-Checked: Checked by ClamAV on apache.org --0-577627439-1239314693=:16516 Content-Type: text/plain; charset=us-ascii I have hadoop jobs with the last 1 reducer randomly hangs on getting 0 mapper output. By randomly I mean the job sometimes works correctly, sometimes their last 1 reducer keeps reading map output but always gets 0 data. It would hang up to 100 hours for getting 0 data until I kill it. After I kill and re-run it, it could run correctly. The hung reducer could happen on any machine of my cluster. I attach the tail of the problematic reducer's log here. Does anybody have a hint what happened? syslog logs 2009-04-09 21:57:46,445 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Need 15 map output(s) 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Got 0 known map output location(s); scheduling... 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2009-04-09 21:57:51,453 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Need 15 map output(s) 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Got 0 known map output location(s); scheduling... 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_000008_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) ... (forever) --0-577627439-1239314693=:16516--