Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 40987 invoked from network); 22 Aug 2007 01:06:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Aug 2007 01:06:36 -0000 Received: (qmail 43009 invoked by uid 500); 22 Aug 2007 01:06:32 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 42870 invoked by uid 500); 22 Aug 2007 01:06:32 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 42861 invoked by uid 99); 22 Aug 2007 01:06:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2007 18:06:32 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jssarma@facebook.com designates 204.15.23.140 as permitted sender) Received: from [204.15.23.140] (HELO SF2PMXF01.TheFacebook.com) (204.15.23.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Aug 2007 01:06:31 +0000 Received: from SF2PMXB01.TheFacebook.com ([192.168.16.15]) by SF2PMXF01.TheFacebook.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 21 Aug 2007 18:07:50 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Subject: RE: missing combiner output Date: Tue, 21 Aug 2007 18:06:10 -0700 Message-ID: In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: missing combiner output Thread-Index: AcfkU5mUxz3MSVjrTAGGlx4j336JXQABOblg From: "Joydeep Sen Sarma" To: X-OriginalArrivalTime: 22 Aug 2007 01:07:50.0853 (UTC) FILETIME=[DC5A9F50:01C7E458] X-Virus-Checked: Checked by ClamAV on apache.org Ah - never mind - the 'combiner output record' metric reported by mapred is lying. The reduce job does see all the records. (I guess this is a bug) -----Original Message----- From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]=20 Sent: Tuesday, August 21, 2007 5:30 PM To: hadoop-user@lucene.apache.org Subject: missing combiner output Hi folks, =20 I am a little puzzled by (what looks to me) is like records that I am emitting from my combiner - but that are not showing up under 'combine output records' (and seem to be disappearing). Here's some evidence: =20 Mapred says: =20 Combine input records 230,803,567=20 Combine output records 112,533,683 =20 i am maintaining three counters and bump one of them when emitting records from the combiner (ie. The combiner emits three types of key-val pairs): =20 COMBINERJOIN 28,264,088 COMBINERPASS 199,193,336 COMBINERKEYS 3,346,143 =20 as can be seen - the total number of combiner outputs (sum of above three counters) is the same as the combine input records - and that is exactly what I expect from my program. However, something is going wrong somewhere and all the emitted records don't show up in the combiner output. There are no exceptions in the logs. And the output.collect() interface does not return an error code. =20 Any ideas what's going on? Is this a pathogenic case (combiner emitting same number of output records as input records) =20 Thanks, =20 Joydeep