hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Baptista <Daniel.Bapti...@performgroup.com>
Subject Spilled Records
Date Tue, 28 Feb 2012 13:25:32 GMT
Hi All,

I am trying to improve the performance of my hadoop cluster and would like to get some feedback
on a couple of numbers that I am seeing.

Below is the output from a single task (1 of 16) that took 3 mins 40 Seconds

FileSystemCounters
FILE_BYTES_READ 214,653,748
HDFS_BYTES_READ 67,108,864
FILE_BYTES_WRITTEN 429,278,388

Map-Reduce Framework
Combine output records 0
Map input records 2,221,478
Spilled Records 4,442,956
Map output bytes 210,196,148
Combine input records 0
Map output records 2,221,478

And another task in the same job (16 of 16) that took 7 minutes and 19 seconds

FileSystemCounters
FILE_BYTES_READ 199,003,192
HDFS_BYTES_READ 58,434,476
FILE_BYTES_WRITTEN 397,975,310

Map-Reduce Framework
Combine output records 0
Map input records 2,086,789
Spilled Records 4,173,578 Map output bytes
194,813,958
Combine input records 0 Map output records 2,086,789

Can anybody determine anything from these figures?

The first task is twice as quick as the second yet the input and output are comparable (certainly
not double). In all of the tasks (in this and other jobs) the spilled records are always double
the output records, this can't be 'normal'?

Am I clutching at straws (it feels like I am).

Thanks in advance, Dan.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message