Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4533510373 for ; Sat, 6 Apr 2013 11:39:30 +0000 (UTC) Received: (qmail 29766 invoked by uid 500); 6 Apr 2013 11:39:25 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 29591 invoked by uid 500); 6 Apr 2013 11:39:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 29582 invoked by uid 99); 6 Apr 2013 11:39:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Apr 2013 11:39:24 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yypvsxf19870706@gmail.com designates 209.85.220.47 as permitted sender) Received: from [209.85.220.47] (HELO mail-pa0-f47.google.com) (209.85.220.47) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Apr 2013 11:39:20 +0000 Received: by mail-pa0-f47.google.com with SMTP id bj3so2454845pad.6 for ; Sat, 06 Apr 2013 04:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:subject:references:from:content-type:x-mailer :in-reply-to:message-id:date:to:content-transfer-encoding :mime-version; bh=EX6EqRmTJDwtR/7KHdik/DjMrfp9G2UIMVYr/GJO73Q=; b=F0YrlJgvLMQSZWnlrt4qQnWnoKx+ErXz7ZibpLRaFWVozaiqbwChMkiTX0+Cidzijy mZDOXQ3FqGPJvofopR62i0F6Nu4WHb3Bz15tbcz5gOBOrGRTEBxNnPGNw2WMSBZv2jqH X9VC43TB+Nar6ay3fLEpXpiwAc9ZxCo0AAu7ZwpKVNUqPebVh/KrL9x6Ck2aGiq+nlWN hfPcSjqX6ryhFtsa9I5dqThSVc5crCetn6heJEdPY20Yy4fmJWXxg57ORud4PSipSAX5 7E9fZei6hnGYYH5Tj6suM8+uG1kbY1aiTiK7dzRRSOg3/KjLa5No61VMwZ/6re5m578w zO+w== X-Received: by 10.66.67.50 with SMTP id k18mr19630858pat.81.1365248339854; Sat, 06 Apr 2013 04:38:59 -0700 (PDT) Received: from [10.10.78.195] ([122.96.47.138]) by mx.google.com with ESMTPS id u9sm20023968paf.22.2013.04.06.04.38.56 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 06 Apr 2013 04:38:58 -0700 (PDT) Subject: Re: MAP_INPUT_BYTES missing from counters References: From: yypvsxf19870706 Content-Type: multipart/alternative; boundary=Apple-Mail-DCEED271-517F-4E19-9BCB-315246C2D68C X-Mailer: iPhone Mail (10B146) In-Reply-To: Message-Id: <8E138AFB-C84A-471F-A1EB-63FF2CD0C07C@gmail.com> Date: Sat, 6 Apr 2013 19:37:43 +0800 To: "user@hadoop.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-DCEED271-517F-4E19-9BCB-315246C2D68C Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Hi=20 Is your input file compressed or named with the suffix gz ,or like that= ? It is interesting . Map_input_bytes is the number of bytes of uncompressed input consumed b= y all the maps in the job.incremented every time a record is read from a Rec= ordReader and passed to the map's map method by framework .[Hadoop Definitiv= e Guide page 226] Please inform of us ,if you get anything further. Regards. =B7=A2=D7=D4=CE=D2=B5=C4 iPhone =D4=DA 2013-4-6=A3=AC0:01=A3=ACPhilippe Signoret =D0=B4=B5=C0=A3=BA > I noticed recently that some Word Count jobs I've run are finishing with t= he MAP_INPUT_BYTES counter missing. >=20 > I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The= input was a single 100KB text file. >=20 > Questions: > Is it normal for any final counters values not to be present? > Is MAP_INPUT_BYTES the best was to determine total input data size? (I do s= o programmatically, while it's running and after the job is complete.) > The counters I did get: >=20 > Job Counters=20 > TOTAL_LAUNCHED_REDUCES:1 > SLOTS_MILLIS_MAPS: 6006 > FALLOW_SLOTS_MILLIS_REDUCES: 0 > FALLOW_SLOTS_MILLIS_MAPS: 0 > TOTAL_LAUNCHED_MAPS: 1 > DATA_LOCAL_MAPS: 1 > SLOTS_MILLIS_REDUCES: 9293 > File Output Format Counters=20 > BYTES_WRITTEN: 366752 > FileSystemCounters > FILE_BYTES_READ: 505552 > HDFS_BYTES_READ: 1085517 > FILE_BYTES_WRITTEN: 1122685 > HDFS_BYTES_WRITTEN: 366752 > File Input Format Counters=20 > BYTES_READ: 1085357 > Map-Reduce Framework > MAP_OUTPUT_MATERIALIZED_BYTES: 505552 > MAP_INPUT_RECORDS: 19446 > REDUCE_SHUFFLE_BYTES: 505552 > SPILLED_RECORDS: 70358 > MAP_OUTPUT_BYTES: 1750111 > CPU_MILLISECONDS: 5700 > COMMITTED_HEAP_BYTES: 401997824 > COMBINE_INPUT_RECORDS: 181151 > SPLIT_RAW_BYTES: 160 > REDUCE_INPUT_RECORDS: 35179 > REDUCE_INPUT_GROUPS: 35179 > COMBINE_OUTPUT_RECORDS:35179 > PHYSICAL_MEMORY_BYTES: 378482688 > REDUCE_OUTPUT_RECORDS: 35179 > VIRTUAL_MEMORY_BYTES: 1139838976 > MAP_OUTPUT_RECORDS: 181151 >=20 > Here are most of the relevant screens from the JobTracker web interface: h= ttp://jsfiddle.net/Fguyy/2/embedded/result/ >=20 > Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsM= n4fB >=20 > Thanks! > Philippe >=20 > ------------------------------- > Philippe Signoret > Skype: philippesignoret > +33 6 95 89 55 55 --Apple-Mail-DCEED271-517F-4E19-9BCB-315246C2D68C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi 

&nbs= p;    Is your input file compressed or named with the suffix gz ,o= r like that?
     It is interesting .
&nb= sp;    Map_input_bytes is the number of bytes of uncompressed &nbs= p;input consumed by all the maps in the job.incremented every time a record i= s read from a RecordReader and passed to the map's map method by framework .= [Hadoop Definitive Guide page 226]

   Please inform of us ,= if you get anything further.

Regards.


=E5=8F=91=E8=87=AA=E6=88=91=E7=9A=84 iPhone

=E5=9C=A8 2013-4= -6=EF=BC=8C0:01=EF=BC=8CPhilippe Signoret <philippe.signoret@gmail.com> =E5=86=99=E9=81=93=EF=BC= =9A

I noticed r= ecently that some Word Count jobs I've run are finishing with the MAP_INPUT_= BYTES counter missing.

I'm using Hadoop 1.1.2 with mostly= default configuration with 5 nodes. The input was a single 100KB text file.=

Questions:
  • Is it normal for a= ny final counters values not to be present?
  • Is MAP_INPUT_BYTES the b= est was to determine total input data size? (I do so programmatically, w= hile it's running and after the job is complete.)
The counters I did get:

=
Job Co= unters 
 TOTAL_LAUNCHED_REDUCES:1
 SLOTS_MILLIS_MAPS: 6006=
 FALLOW_SLOTS_MILLIS_REDUCES: 0
 FALLOW_SLOTS_MILLIS_MAPS: 0
 TOTAL_LAUNCHED_MAPS: 1
 DATA_LOCAL_MAPS: 1
 SLOTS_MILLIS_REDUCES:= 9293
File Output Format Counters 
=
 BYTES_WRITTEN: 366752
FileSystemCounters
 FILE_BYTES_READ: 505552
 HDFS_BYTES_READ: 1085= 517
 FILE_BYTES_WRITTEN: 1122685
 HDFS_BYTES_WRITTEN: 3667= 52
File Input Format Counters 
 BYTES_READ: 1085357
Map-Reduce Framework
 MAP_OUTPUT_MATERIALIZED_BYTES:= 505552
 MAP_INPUT_RECORDS: 19446
 REDUCE_SHUFFLE_BYTES: 505552
 SPILLED_RECORDS: 70358
 MAP_OUTPUT_BYTES: 1750= 111
 CPU_MILLISECONDS: 5700
 COMMITTED_HEAP_BYTES: 401997824
 COMBINE_INPUT_RECORDS: 181151
 SPLIT_RAW_BYTES: 160<= /font>
 REDUCE_INPUT_RECORDS: 35179
 REDUCE_INPUT_GROUPS: 35179
 COMBINE_OUTPUT_RECORDS:35179
 PHYSICAL_MEMORY_BYTES: 378482688
 REDUCE_OUTPUT_RECORDS: 35179
 VIRTUAL_MEMORY_BYTES: 1139838976
 MAP_OUTPUT_RECORDS: 181151

Here are most o= f the relevant screens from the JobTracker web interface: http://jsfiddle.net= /Fguyy/2/embedded/result/

Here is the JobTracker log (relevant time frame): <= a href=3D"http://pastebin.com/dvsMn4fB">http://pastebin.com/dvsMn4fB

Thanks!
Philippe

-----------------------------= --
Philippe Signoret
Skype: phil= ippesignoret
= --Apple-Mail-DCEED271-517F-4E19-9BCB-315246C2D68C--