Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 98212 invoked from network); 21 Jun 2010 15:45:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Jun 2010 15:45:23 -0000 Received: (qmail 23284 invoked by uid 500); 21 Jun 2010 15:45:23 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 23226 invoked by uid 500); 21 Jun 2010 15:45:22 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 23218 invoked by uid 99); 21 Jun 2010 15:45:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 15:45:21 +0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [93.94.224.195] (HELO owa.exchange-login.net) (93.94.224.195) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 15:45:13 +0000 Received: from HC1.hosted.exchange-login.net (93.94.224.200) by edge2.hosted.exchange-login.net (93.94.224.195) with Microsoft SMTP Server (TLS) id 14.0.694.0; Mon, 21 Jun 2010 17:44:54 +0200 Received: from MBX1.hosted.exchange-login.net ([fe80::a957:8775:7bf4:6581]) by hc1.hosted.exchange-login.net ([2002:5d5e:e0c8::5d5e:e0c8]) with mapi; Mon, 21 Jun 2010 17:44:51 +0200 From: Friso van Vollenhoven To: "mapreduce-user@hadoop.apache.org" Subject: reducers run past 100% (does that problem still exist?) Thread-Topic: reducers run past 100% (does that problem still exist?) Thread-Index: AQHLEVivpgSaRrKRdUKzitDdMWJ7Xg== Date: Mon, 21 Jun 2010 15:44:49 +0000 Message-ID: Accept-Language: nl-NL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-ID: <60eeac9d-ab2c-4c93-82df-2380302b55cd> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hi all, When I run long running map/reduce jobs the reducers run past 100% before r= eaching completion. Sometimes as far as up to 140%. I have searched the mai= ling list and other resources and noticed bug reports related to this when = using map output compression, but all appear to be fixed by now. The job I am running reads sequence files from HDFS and in the reducer inse= rts records into HBase. The reducer has NullWritable as both output key and= output value. Some additional info: - the job takes in total close to 60 hours to complete - there are 10 reducers - the map output is compressed using the default codec and block compressio= n - speculative execution is turned off (otherwise we could be hitting HBase = harder than necessary) - mapred.job.reuse.jvm.num.tasks =3D 1 - io.sort.factor =3D 100 - io.sort.record.percent =3D 0.3 - io.sort.spill.percent =3D 0.9 - mapred.inmem.merge.threshold =3D 100 - mapred.job.reduce.input.buffer.percent =3D 1.0 I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT). Does anyone have a clue? Or can anyone tell me how the progress info for re= ducers is calculated? Any help is appreciated. Regards, Friso