Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6531B104BC for ; Fri, 22 Nov 2013 11:17:13 +0000 (UTC) Received: (qmail 68332 invoked by uid 500); 22 Nov 2013 11:17:05 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 68197 invoked by uid 500); 22 Nov 2013 11:17:05 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68190 invoked by uid 99); 22 Nov 2013 11:17:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 11:17:04 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zoraida@tid.es designates 195.235.93.200 as permitted sender) Received: from [195.235.93.200] (HELO correo-bck.tid.es) (195.235.93.200) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 11:16:59 +0000 Received: from sbrightmailg02.hi.inet (Sbrightmailg02.hi.inet [10.95.78.105]) by tid.hi.inet (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0MWN009B8WNOK5@tid.hi.inet> for user@hadoop.apache.org; Fri, 22 Nov 2013 12:16:36 +0100 (MET) Received: from vanvan (vanvan.hi.inet [10.95.78.49]) by sbrightmailg02.hi.inet (Symantec Messaging Gateway) with SMTP id 5F.96.28420.41D3F825; Fri, 22 Nov 2013 12:16:36 +0100 (CET) Received: from correo.tid.es (mailhost.hi.inet [10.95.64.100]) by tid.hi.inet (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPS id <0MWN009B5WNOK5@tid.hi.inet> for user@hadoop.apache.org; Fri, 22 Nov 2013 12:16:36 +0100 (MET) Received: from EX10-MB2-MAD.hi.inet ([169.254.2.204]) by EX10-HTCAS6-MAD.hi.inet ([::1]) with mapi id 14.03.0158.001; Fri, 22 Nov 2013 12:16:36 +0100 Date: Fri, 22 Nov 2013 11:16:35 +0000 From: ZORAIDA HIDALGO SANCHEZ Subject: Re: Missing records from HDFS In-reply-to: X-Originating-IP: [10.95.64.115] To: "user@hadoop.apache.org" Message-id: MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_UKDJ5xjwU8uVHRV1ks/9EA)" Content-language: es-ES Accept-Language: es-ES, en-US Thread-topic: Missing records from HDFS Thread-index: AQHO5g0WxThxnAl3aUG4c3xC9ACOC5ovKeWAgAG7mYCAADcBgA== X-AuditID: 0a5f4e69-b7fe58e000006f04-e1-528f3d140daa X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJLMWRmVeSWpSXmKPExsXCFe9nqCti2x9ksG+DtkXPlGksDoweE7q2 MAYwRnHZpKTmZJalFunbJXBlzO6YyVqwxbviwqa17A2M3xy7GDk5JARMJDbPXMQCYYtJXLi3 nq2LkYtDSGAbo8SLvlnsEE4Dk8TuWxegnJmMEgu6uoAcDg4WAVWJzZPYQLrZBPQkLrZ8YQWx hQXUJNYcO8MEYnMK6Ejsfz6RGWKDgsSfc49ZQFpFBEwlep7qgoR5BbQlpixuYYOwBSV+TL4H dhCzgK/Ek4ddzBC2uMScXxPBxjMKyEqsPH+aEcQWEVCXeHijkwXCdpLYu+IwO8QqAYkle85D rRWVePn4H+sERpFZSFbMQrJiFpIVELaexI2pU9ggbG2JZQtfQ9XoSsz4dwiq10xi4qTHrMhq FjByrGIUK04qykzPKMlNzMxJNzDSy8jUy8xLLdnECImuzB2My3eqHGIU4GBU4uHdadkXJMSa WFZcmXuIUYKDWUmE959qf5AQb0piZVVqUX58UWlOavEhRiYOTqkGxuV5/raP5beG+fKtf/9S eNulNoErs6I8XY+9O1y7/F+P/VLD3Nkq0V3/BIT6XJ3++Z9lvX6CZd3KiVyt3t/fZ3qvNZqZ eGNBfpMm+78C4/1CW22E+WTSdq9bU7vu5aYpmTfmnhJzYcqYvtDBcrnT3N7q2nOX5x7Vj9K4 8yLCfNmLeo62xVKlJUosxRmJhlrMRcWJAMmV55uMAgAA X-Virus-Checked: Checked by ClamAV on apache.org --Boundary_(ID_UKDJ5xjwU8uVHRV1ks/9EA) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: quoted-printable One more thing, if we split the files then all the records are processed. Files are of 70,5= MB. Thanks, Zoraida.- De: zoraida > Fecha: viernes, 22 de noviembre de 2013 08:59 Para: "user@hadoop.apache.org" > Asunto: Re: Missing records from HDFS Thanks for your response Azuryy. My hadoop version: 2.0.0-cdh4.3.0 InputFormat: a custom class that extends from FileInputFormat(csv input for= mat) These fiels are under the same directory, different files. My input path is configured using oozie throughout the propertie mapred.inp= ut.dir. Same code and input running on Hadoop 2.0.0-cdh4.2.1 works fine. Does not d= iscard any record. Thanks. De: Azuryy Yu > Responder a: "user@hadoop.apache.org" > Fecha: jueves, 21 de noviembre de 2013 07:31 Para: "user@hadoop.apache.org" > Asunto: Re: Missing records from HDFS what's your hadoop version? and which InputFormat are you used? these files under one directory or there are lots of subdirectory? how ddi = you configure input path in your main? On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ > wrote: Hi all, my job is not reading all the input records. In the input directory I have = a set of files containing a total of 6000000 records but only 5997000 are p= rocessed. The Map Input Records counter says 5997000. I have tried downloading the files with a getmerge to check how many record= s would return but the correct number is returned(6000000). Do you have any suggestion? Thanks. ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=EDtica de env=EDo y recepci=F3n de correo electr=F3nico en el enl= ace situado m=E1s abajo. This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=EDtica de env=EDo y recepci=F3n de correo electr=F3nico en el enl= ace situado m=E1s abajo. This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx --Boundary_(ID_UKDJ5xjwU8uVHRV1ks/9EA) Content-id: Content-type: text/html; charset=iso-8859-1 Content-transfer-encoding: quoted-printable
One more thing,

if we split the files then all the records are processed. Files are of=  70,5MB.

Thanks,

Zoraida.-

De: zoraida <zoraida@tid.es>
Fecha: viernes, 22 de noviembre de = 2013 08:59
Para: "user@hadoop.apache.org" <user@hadoop.apache.org>
Asunto: Re: Missing records from HD= FS

Thanks for your response Azuryy.

My hadoop version: 2.0.0-cdh4.3.0
InputFormat: a custom class that extends from FileInputFormat(csv input format)
These fiels are und= er the same directory, different files.
My input path is configured using ooz= ie throughout the propertie mapred.input.dir. 


Same code and input= running on Hadoop 2.0.0-cdh4.2.1 works fine. Does not discard = any record.

Thanks.

De: Azuryy Yu <azuryyyu@gmail.com>
Responder a: "user@hadoop.apache.org" <user@hadoop.apache.org>
Fecha: jueves, 21 de noviembre de 2= 013 07:31
Para: "user@hadoop.apache.org" <user@hadoop.apache.org>
Asunto: Re: Missing records from HD= FS

what's your hadoop version? and which InputFormat are you = used?

these files under one directory or there are lots of subdirectory? how= ddi you configure input path in your main?



On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALG= O SANCHEZ <z= oraida@tid.es> wrote:
Hi all,

my job is not reading all the input records. In the input directory I = have a set of files containing a total of 6000000 records but only 5997000 = are processed. The Map Input Records counter says 5997000.
I have tried downloading the files with a getmerge to check how many r= ecords would return but the correct number is returned(6000000).

Do you have any suggestion? 

Thanks. 



Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=EDtica de env=EDo y recepci=F3n de correo electr=F3nico en el enl= ace situado m=E1s abajo.
This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at:
= http://www.tid.es/ES/PAGINAS/disclaimer.aspx




Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nu= estra pol=EDtica de env=EDo y recepci=F3n de correo electr=F3nico en el enl= ace situado m=E1s abajo.
This message is intended exclusively for its addressee. We only send and re= ceive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx
--Boundary_(ID_UKDJ5xjwU8uVHRV1ks/9EA)--