hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nishan shetty <nishan.she...@huawei.com>
Subject RE: issue about total input byte of MR job
Date Tue, 03 Dec 2013 08:46:53 GMT
Hi Ch Huang

Are you sure your input data size is 170G?
Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024).
Each file will be considered as separate split which may be small.

Please cross check the input size using CLI


From: ch huang [mailto:justlooks@gmail.com]
Sent: 03 December 2013 01:58 PM
To: user@hadoop.apache.org
Subject: issue about total input byte of MR job

i run the MR job,at the MR output i see

13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717

because my each data block size is 64M,so total byte is 2717*64M/1024= 170G

but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024
= 118G ,the two number is not very close ,why?

        File System Counters
                FILE: Number of bytes read=9642910241
                FILE: Number of bytes written=120327706125
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=126792190158
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=8151
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0

View raw message