hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schmidtke <ro.schmid...@gmail.com>
Subject Re: HDFS TestDFSIO performance with 1 mapper/file of 16G per node
Date Tue, 01 Dec 2015 15:21:48 GMT
Hi Andreas,

thanks for your reply. I'm glad I'm not the only one running into this
issue. I'm currently in the process of profiling HDFS carefully and
hopefully I'll come across something. It would seem that the OS caches all
writes to disk, and there may be problems with HDFS being on-heap mostly
(wild guess).

I am running on bare metal here, with Yarn being the only layer in between.

Robert


On Tue, Dec 1, 2015 at 11:07 AM, Andreas Fritzler <
andreas.fritzler@gmail.com> wrote:

> Hi Robert,
>
> I'm experiencing the same effect regarding the RAM consumption when
> running TestDFSIO with huge files. I cound't really find out yet why this
> is happening though.
>
> One question regarding your setup: Are you running your cluster on bare
> metal or virtualized?
>
> Regards,
> Andreas
>
> On Fri, Nov 27, 2015 at 5:07 PM, Robert Schmidtke <ro.schmidtke@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I cannot seem to wrap my head around the TestDFSIO benchmark results. I
>> have a cluster of 8 nodes, the first one runs the NameNode and the
>> ResourceManager, the others each run a DataNode and a NodeManager. Each
>> node has 64G of RAM. I am using a block size for HDFS of 128M, and a
>> replication factor of 3. I'm running the benchmark with 7 map tasks, each
>> processing one file of 16G. The results are as follows:
>>
>> For the write phase:
>> 15/11/27 14:50:32 INFO mapreduce.Job: Counters: 49
>> File System Counters
>> FILE: Number of bytes read=626
>> FILE: Number of bytes written=929305
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=1673
>> HDFS: Number of bytes written=120259084369
>> HDFS: Number of read operations=31
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=9
>> Job Counters
>> Launched map tasks=7
>> Launched reduce tasks=1
>> Data-local map tasks=7
>> Total time spent by all maps in occupied slots (ms)=3816135
>> Total time spent by all reduces in occupied slots (ms)=277826
>> Total time spent by all map tasks (ms)=3816135
>> Total time spent by all reduce tasks (ms)=277826
>> Total vcore-seconds taken by all map tasks=3816135
>> Total vcore-seconds taken by all reduce tasks=277826
>> Total megabyte-seconds taken by all map tasks=3907722240
>> Total megabyte-seconds taken by all reduce tasks=284493824
>> Map-Reduce Framework
>> Map input records=7
>> Map output records=35
>> Map output bytes=550
>> Map output materialized bytes=662
>> Input split bytes=889
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=5
>> Reduce shuffle bytes=662
>> Reduce input records=35
>> Reduce output records=5
>> Spilled Records=70
>> Shuffled Maps =7
>> Failed Shuffles=0
>> Merged Map outputs=7
>> GC time elapsed (ms)=73040
>> CPU time spent (ms)=754470
>> Physical memory (bytes) snapshot=2070585344
>> Virtual memory (bytes) snapshot=7448678400
>> Total committed heap usage (bytes)=1346895872
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=784
>> File Output Format Counters
>> Bytes Written=81
>> 15/11/27 14:50:32 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
>> 15/11/27 14:50:32 INFO fs.TestDFSIO:            Date & time: Fri Nov 27
>> 14:50:32 CET 2015
>> 15/11/27 14:50:32 INFO fs.TestDFSIO:        Number of files: 7
>> 15/11/27 14:50:32 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
>> 15/11/27 14:50:32 INFO fs.TestDFSIO:      Throughput mb/sec:
>> 30.260630125485257
>> 15/11/27 14:50:32 INFO fs.TestDFSIO: Average IO rate mb/sec:
>> 31.812650680541992
>> 15/11/27 14:50:32 INFO fs.TestDFSIO:  IO rate std deviation:
>> 6.862839589024802
>> 15/11/27 14:50:32 INFO fs.TestDFSIO:     Test exec time sec: 705.559
>>
>> For the read phase:
>> 15/11/27 14:56:46 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=637
>> FILE: Number of bytes written=929311
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=120259085961
>> HDFS: Number of bytes written=84
>> HDFS: Number of read operations=38
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=2
>> Job Counters
>> Killed map tasks=4
>> Launched map tasks=11
>> Launched reduce tasks=1
>> Data-local map tasks=9
>> Rack-local map tasks=2
>> Total time spent by all maps in occupied slots (ms)=2096182
>> Total time spent by all reduces in occupied slots (ms)=272737
>> Total time spent by all map tasks (ms)=2096182
>> Total time spent by all reduce tasks (ms)=272737
>> Total vcore-seconds taken by all map tasks=2096182
>> Total vcore-seconds taken by all reduce tasks=272737
>> Total megabyte-seconds taken by all map tasks=2146490368
>> Total megabyte-seconds taken by all reduce tasks=279282688
>> Map-Reduce Framework
>> Map input records=7
>> Map output records=35
>> Map output bytes=561
>> Map output materialized bytes=673
>> Input split bytes=889
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=5
>> Reduce shuffle bytes=673
>> Reduce input records=35
>> Reduce output records=5
>> Spilled Records=70
>> Shuffled Maps =7
>> Failed Shuffles=0
>> Merged Map outputs=7
>> GC time elapsed (ms)=1407
>> CPU time spent (ms)=96580
>> Physical memory (bytes) snapshot=1945346048
>> Virtual memory (bytes) snapshot=7420182528
>> Total committed heap usage (bytes)=1421869056
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=784
>> File Output Format Counters
>> Bytes Written=84
>> 15/11/27 14:56:46 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
>> 15/11/27 14:56:46 INFO fs.TestDFSIO:            Date & time: Fri Nov 27
>> 14:56:46 CET 2015
>> 15/11/27 14:56:46 INFO fs.TestDFSIO:        Number of files: 7
>> 15/11/27 14:56:46 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
>> 15/11/27 14:56:46 INFO fs.TestDFSIO:      Throughput mb/sec:
>> 101.66069077999725
>> 15/11/27 14:56:46 INFO fs.TestDFSIO: Average IO rate mb/sec:
>> 153.29443359375
>> 15/11/27 14:56:46 INFO fs.TestDFSIO:  IO rate std deviation:
>> 115.23795112362669
>> 15/11/27 14:56:46 INFO fs.TestDFSIO:     Test exec time sec: 370.396
>>
>> I usually average over a number of runs, but this is just a single one. I
>> have seen much greater execution times for both the write and the read
>> phases (up to 2x the ones stated above). I have compared the performance to
>> XtreemFS, another distributed file system, and it achieves an average write
>> rate that is around 6x faster, and an average read rate that is around 2.5x
>> faster, however with a much lower standard deviation rate. Note that HDFS
>> outperforms XtreemFS in terms of TestDFSIO benchmark results with smaller
>> files (1G). I also don' understand why map tasks are being killed.
>>
>> I noticed that quite some time is spent on GC, so I plotted some system
>> metrics for one node running a DataNode/NodeManager (the write phase stops
>> and the read phase starts at around 14:51):
>>
>>
>>
>> I am wondering why the system RAM usage goes all the way up to the Yarn
>> container limit of around 60G, when only one 16G file is being written
>> to/read from on this node.‚Äč
>>
>> So, to sum up a list of questions that I would like to ask your advice
>> about:
>>
>> - Has anyone observed similar behavior/performance with HDFS and large
>> files?
>> - Which configuration parameters would I need to tweak in order to
>> increase performance, or which system properties do I need to examine for
>> you to give me helpful advice here?
>> - Why is the RAM consumption so high?
>> - Why is the reported standard deviation during read phases so high?
>>
>> Thanks in advance for your time and effort!
>>
>> Robert
>>
>> --
>> My GPG Key ID: 336E2680
>>
>
>


-- 
My GPG Key ID: 336E2680

Mime
View raw message