Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 11385 invoked from network); 3 Jun 2008 06:09:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 06:09:52 -0000 Received: (qmail 71739 invoked by uid 500); 3 Jun 2008 06:09:53 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 71689 invoked by uid 500); 3 Jun 2008 06:09:52 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 71678 invoked by uid 99); 3 Jun 2008 06:09:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jun 2008 23:09:52 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [81.169.154.44] (HELO heaven.kostyrka.org) (81.169.154.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 06:09:03 +0000 Received: from localhost (localhost [127.0.0.1]) by heaven.kostyrka.org (Postfix) with ESMTP id 7A8B54FD20 for ; Tue, 3 Jun 2008 08:09:18 +0200 (CEST) Received: from heaven.kostyrka.org ([127.0.0.1]) by localhost (heaven.kostyrka.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 10522-03 for ; Tue, 3 Jun 2008 08:09:18 +0200 (CEST) Received: from andi-lap.lan (88-117-73-229.adsl.highway.telekom.at [88.117.73.229]) by heaven.kostyrka.org (Postfix) with ESMTP id 16AF64FB05 for ; Tue, 3 Jun 2008 08:09:18 +0200 (CEST) From: Andreas Kostyrka To: core-user@hadoop.apache.org Subject: Re: Stackoverflow Date: Tue, 3 Jun 2008 08:09:41 +0200 User-Agent: KMail/1.9.9 References: <200806021912.35346.andreas@kostyrka.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200806030809.41664.andreas@kostyrka.org> X-Virus-Checked: Checked by ClamAV on apache.org On Tuesday 03 June 2008 04:53:22 Chris Douglas wrote: > Is anyone observing this outside of streaming? > > We've been able to reproduce this trace with a bad comparator that > only returns negative values, but haven't found any uncontrived > patterns in data that produce this, nor any comparators in 0.17 with > this property. A bad partitioner also returning only negative values > would behave similarly, but not this uniformly. Ok, let's take a look, the hadoop call is like this: hadoop jar $HOME/hadoop-0.17.0/contrib/streaming/hadoop-0.17.0-streaming.jar -output /user/hadoop/$(basename $(pwd)) -mapper cat -reducer /home/hadoop/bin/lrp\ --stderr -jobconf mapred.reduce.tasks=88 $CMD The data is a representation for loglines, and not exactly small, e.g. the stuff has already been reduced once. The bug is probably triggered by size, because reducing the data in two seperate smaller runs work fine. I have no small data set that triggers this problem. The interesting thing is that it happens inside the last Map task, not in the reducer tasks. As you can see above the mapper cmd is rather on the simple side. > How many reducers are you running? Are you using the 0.17 streaming > jar? Are you running with the default comparator/partitioner? If you > run the same job as a Java sort, do you see the same behavior? -C I have no Java implementation of my job, sorry. Andreas Hadoop job_200805291303_0088 on ec2-67-202-58-97 User: hadoop Job Name: streamjob51857.jar Job File: /mnt/tmp/hadoop-hadoop/mapred/system/job_200805291303_0088/job.xml Status: Failed Started at: Mon Jun 02 16:11:29 GMT 2008 Failed at: Mon Jun 02 16:13:34 GMT 2008 Failed in: 2mins, 5sec Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 98.61% 72 0 0 71 1 4 / 11 reduce 100.00% 88 0 0 0 88 0 / 22 Counter Map Reduce Total File Systems Local bytes written 2,790,820,175 107,780,646 2,898,600,821 HDFS bytes read 2,633,043,249 0 2,633,043,249 Job Counters Failed map tasks 0 0 1 Launched map tasks 0 0 86 Launched reduce tasks 0 0 22 Data-local map tasks 0 0 69 Rack-local map tasks 0 0 5 Map-Reduce Framework Map input records 12,148,547 0 12,148,547 Map output records 12,148,547 0 12,148,547 Map input bytes 2,633,043,249 0 2,633,043,249 Map output bytes 2,645,311,659 0 2,645,311,659 Combine input records 0 0 0 Combine output records 0 0 0 Reduce input groups 0 0 0 Reduce input records 0 0 0 Reduce output records 0 0 0 Map Completion Graph - close Reduce Completion Graph - close Change priority from NORMAL to: VERY_HIGH HIGH LOW VERY_LOW Go back to JobTracker Hadoop, 2008.