hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schmidtke <ro.schmid...@gmail.com>
Subject HDFS TestDFSIO performance with 1 mapper/file of 16G per node
Date Fri, 27 Nov 2015 16:07:36 GMT
Hi everyone,

I cannot seem to wrap my head around the TestDFSIO benchmark results. I
have a cluster of 8 nodes, the first one runs the NameNode and the
ResourceManager, the others each run a DataNode and a NodeManager. Each
node has 64G of RAM. I am using a block size for HDFS of 128M, and a
replication factor of 3. I'm running the benchmark with 7 map tasks, each
processing one file of 16G. The results are as follows:

For the write phase:
15/11/27 14:50:32 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=626
FILE: Number of bytes written=929305
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1673
HDFS: Number of bytes written=120259084369
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=7
Launched reduce tasks=1
Data-local map tasks=7
Total time spent by all maps in occupied slots (ms)=3816135
Total time spent by all reduces in occupied slots (ms)=277826
Total time spent by all map tasks (ms)=3816135
Total time spent by all reduce tasks (ms)=277826
Total vcore-seconds taken by all map tasks=3816135
Total vcore-seconds taken by all reduce tasks=277826
Total megabyte-seconds taken by all map tasks=3907722240
Total megabyte-seconds taken by all reduce tasks=284493824
Map-Reduce Framework
Map input records=7
Map output records=35
Map output bytes=550
Map output materialized bytes=662
Input split bytes=889
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=662
Reduce input records=35
Reduce output records=5
Spilled Records=70
Shuffled Maps =7
Failed Shuffles=0
Merged Map outputs=7
GC time elapsed (ms)=73040
CPU time spent (ms)=754470
Physical memory (bytes) snapshot=2070585344
Virtual memory (bytes) snapshot=7448678400
Total committed heap usage (bytes)=1346895872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=784
File Output Format Counters
Bytes Written=81
15/11/27 14:50:32 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
15/11/27 14:50:32 INFO fs.TestDFSIO:            Date & time: Fri Nov 27
14:50:32 CET 2015
15/11/27 14:50:32 INFO fs.TestDFSIO:        Number of files: 7
15/11/27 14:50:32 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
15/11/27 14:50:32 INFO fs.TestDFSIO:      Throughput mb/sec:
30.260630125485257
15/11/27 14:50:32 INFO fs.TestDFSIO: Average IO rate mb/sec:
31.812650680541992
15/11/27 14:50:32 INFO fs.TestDFSIO:  IO rate std deviation:
6.862839589024802
15/11/27 14:50:32 INFO fs.TestDFSIO:     Test exec time sec: 705.559

For the read phase:
15/11/27 14:56:46 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=637
FILE: Number of bytes written=929311
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=120259085961
HDFS: Number of bytes written=84
HDFS: Number of read operations=38
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=4
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=9
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=2096182
Total time spent by all reduces in occupied slots (ms)=272737
Total time spent by all map tasks (ms)=2096182
Total time spent by all reduce tasks (ms)=272737
Total vcore-seconds taken by all map tasks=2096182
Total vcore-seconds taken by all reduce tasks=272737
Total megabyte-seconds taken by all map tasks=2146490368
Total megabyte-seconds taken by all reduce tasks=279282688
Map-Reduce Framework
Map input records=7
Map output records=35
Map output bytes=561
Map output materialized bytes=673
Input split bytes=889
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=673
Reduce input records=35
Reduce output records=5
Spilled Records=70
Shuffled Maps =7
Failed Shuffles=0
Merged Map outputs=7
GC time elapsed (ms)=1407
CPU time spent (ms)=96580
Physical memory (bytes) snapshot=1945346048
Virtual memory (bytes) snapshot=7420182528
Total committed heap usage (bytes)=1421869056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=784
File Output Format Counters
Bytes Written=84
15/11/27 14:56:46 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
15/11/27 14:56:46 INFO fs.TestDFSIO:            Date & time: Fri Nov 27
14:56:46 CET 2015
15/11/27 14:56:46 INFO fs.TestDFSIO:        Number of files: 7
15/11/27 14:56:46 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
15/11/27 14:56:46 INFO fs.TestDFSIO:      Throughput mb/sec:
101.66069077999725
15/11/27 14:56:46 INFO fs.TestDFSIO: Average IO rate mb/sec: 153.29443359375
15/11/27 14:56:46 INFO fs.TestDFSIO:  IO rate std deviation:
115.23795112362669
15/11/27 14:56:46 INFO fs.TestDFSIO:     Test exec time sec: 370.396

I usually average over a number of runs, but this is just a single one. I
have seen much greater execution times for both the write and the read
phases (up to 2x the ones stated above). I have compared the performance to
XtreemFS, another distributed file system, and it achieves an average write
rate that is around 6x faster, and an average read rate that is around 2.5x
faster, however with a much lower standard deviation rate. Note that HDFS
outperforms XtreemFS in terms of TestDFSIO benchmark results with smaller
files (1G). I also don' understand why map tasks are being killed.

I noticed that quite some time is spent on GC, so I plotted some system
metrics for one node running a DataNode/NodeManager (the write phase stops
and the read phase starts at around 14:51):



I am wondering why the system RAM usage goes all the way up to the Yarn
container limit of around 60G, when only one 16G file is being written
to/read from on this node.‚Äč

So, to sum up a list of questions that I would like to ask your advice
about:

- Has anyone observed similar behavior/performance with HDFS and large
files?
- Which configuration parameters would I need to tweak in order to increase
performance, or which system properties do I need to examine for you to
give me helpful advice here?
- Why is the RAM consumption so high?
- Why is the reported standard deviation during read phases so high?

Thanks in advance for your time and effort!

Robert

-- 
My GPG Key ID: 336E2680

Mime
View raw message