hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjini Rathinam <ranjinibe...@gmail.com>
Subject Re: issue about total input byte of MR job
Date Tue, 03 Dec 2013 08:45:57 GMT
THis is the input. please help with code example.


<Company>
<Employee>
<id>100</id>
<ename>ert</ename>
<Address>
<home>eewre</home>
<office>wefwef</office>
</address>
</employee>
</Company>

On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <jezhang@gopivotal.com> wrote:

> It depend on your input data.  E.g. your input consists of 10 files, each
> is 65M, then each file will take 2 mappers, overall it would cost 20
> mappers, but the input size is actually 650M rather than 20*64=1280M
>
>
> On Tue, Dec 3, 2013 at 4:28 PM, ch huang <justlooks@gmail.com> wrote:
>
>> i run the MR job,at the MR output i see
>>
>> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>>
>> because my each data block size is 64M,so total byte is 2717*64M/1024=
>> 170G
>>
>> but in the summary of end i see follow info ,so the HDFS read byte is
>> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>>
>>         File System Counters
>>                 FILE: Number of bytes read=9642910241
>>                 FILE: Number of bytes written=120327706125
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=126792190158
>>                 HDFS: Number of bytes written=0
>>                 HDFS: Number of read operations=8151
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=0
>>
>
>

Mime
View raw message