Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: Friso van Vollenhoven <fvanvollenhoven@xebia.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
Subject: reducers run past 100% (does that problem still exist?)
Thread-Topic: reducers run past 100% (does that problem still exist?)
Thread-Index: AQHLEVivpgSaRrKRdUKzitDdMWJ7Xg==
Date: Mon, 21 Jun 2010 15:44:49 +0000
Message-ID: <D0B7E409-6390-478D-97A6-B555F8C0F374@xebia.com>
Accept-Language: nl-NL, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-ID: <60eeac9d-ab2c-4c93-82df-2380302b55cd>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hi all,

When I run long running map/reduce jobs the reducers run past 100% before r=
eaching completion. Sometimes as far as up to 140%. I have searched the mai=
ling list and other resources and noticed bug reports related to this when =
using map output compression, but all appear to be fixed by now.

The job I am running reads sequence files from HDFS and in the reducer inse=
rts records into HBase. The reducer has NullWritable as both output key and=
 output value.
Some additional info:
- the job takes in total close to 60 hours to complete
- there are 10 reducers
- the map output is compressed using the default codec and block compressio=
n
- speculative execution is turned off (otherwise we could be hitting HBase =
harder than necessary)
- mapred.job.reuse.jvm.num.tasks =3D 1
- io.sort.factor =3D 100
- io.sort.record.percent =3D 0.3
- io.sort.spill.percent =3D 0.9
- mapred.inmem.merge.threshold =3D 100
- mapred.job.reduce.input.buffer.percent =3D 1.0

I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).

Does anyone have a clue? Or can anyone tell me how the progress info for re=
ducers is calculated? Any help is appreciated.


Regards,
Friso