hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: missing combiner output
Date Wed, 22 Aug 2007 01:06:10 GMT
Ah - never mind - the 'combiner output record' metric reported by mapred
is lying. The reduce job does see all the records.

(I guess this is a bug)

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Tuesday, August 21, 2007 5:30 PM
To: hadoop-user@lucene.apache.org
Subject: missing combiner output

Hi folks,

 

I am a little puzzled by (what looks to me) is like records that I am
emitting from my combiner - but that are not showing up under 'combine
output records' (and seem to be disappearing). Here's some evidence:

 

Mapred says:

 

Combine input records 230,803,567 

Combine output records 112,533,683

 

i am maintaining three counters and bump one of them when emitting
records from the combiner (ie. The combiner emits three types of key-val
pairs):

 

COMBINERJOIN 28,264,088

COMBINERPASS 199,193,336

COMBINERKEYS 3,346,143

 

as can be seen - the total number of combiner outputs (sum of above
three counters) is the same as the combine input records - and that is
exactly what I expect from my program. However, something is going wrong
somewhere and all the emitted records don't show up in the combiner
output. There are no exceptions in the logs. And the output.collect()
interface does not return an error code.

 

Any ideas what's going on? Is this a pathogenic case (combiner emitting
same number of output records as input records)

 

Thanks,

 

Joydeep


Mime
View raw message