hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sath <neiodavi...@gmail.com>
Subject Unsubscribe
Date Tue, 01 Dec 2015 23:47:29 GMT
Unsubscribe 

Sent from my iPhone

> On Dec 1, 2015, at 3:27 PM, Aladdin Elston <aelston@aerohive.com> wrote:
> 
> 
> Thank you, Aladdin
> 
> 
> From: Robert Schmidtke
> Date: Tuesday, December 1, 2015 at 7:21 AM
> To: "user@hadoop.apache.org"
> Subject: Re: HDFS TestDFSIO performance with 1 mapper/file of 16G per node
> 
> Hi Andreas,
> 
> thanks for your reply. I'm glad I'm not the only one running into this issue. I'm currently
in the process of profiling HDFS carefully and hopefully I'll come across something. It would
seem that the OS caches all writes to disk, and there may be problems with HDFS being on-heap
mostly (wild guess).
> 
> I am running on bare metal here, with Yarn being the only layer in between.
> 
> Robert
> 
> 
>> On Tue, Dec 1, 2015 at 11:07 AM, Andreas Fritzler <andreas.fritzler@gmail.com>
wrote:
>> Hi Robert,
>> 
>> I'm experiencing the same effect regarding the RAM consumption when running TestDFSIO
with huge files. I cound't really find out yet why this is happening though.
>> 
>> One question regarding your setup: Are you running your cluster on bare metal or
virtualized? 
>> 
>> Regards,
>> Andreas
>> 
>>> On Fri, Nov 27, 2015 at 5:07 PM, Robert Schmidtke <ro.schmidtke@gmail.com>
wrote:
>>> Hi everyone,
>>> 
>>> I cannot seem to wrap my head around the TestDFSIO benchmark results. I have
a cluster of 8 nodes, the first one runs the NameNode and the ResourceManager, the others
each run a DataNode and a NodeManager. Each node has 64G of RAM. I am using a block size for
HDFS of 128M, and a replication factor of 3. I'm running the benchmark with 7 map tasks, each
processing one file of 16G. The results are as follows:
>>> 
>>> For the write phase:
>>> 15/11/27 14:50:32 INFO mapreduce.Job: Counters: 49
>>> File System Counters
>>> FILE: Number of bytes read=626
>>> FILE: Number of bytes written=929305
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=1673
>>> HDFS: Number of bytes written=120259084369
>>> HDFS: Number of read operations=31
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=9
>>> Job Counters
>>> Launched map tasks=7
>>> Launched reduce tasks=1
>>> Data-local map tasks=7
>>> Total time spent by all maps in occupied slots (ms)=3816135
>>> Total time spent by all reduces in occupied slots (ms)=277826
>>> Total time spent by all map tasks (ms)=3816135
>>> Total time spent by all reduce tasks (ms)=277826
>>> Total vcore-seconds taken by all map tasks=3816135
>>> Total vcore-seconds taken by all reduce tasks=277826
>>> Total megabyte-seconds taken by all map tasks=3907722240
>>> Total megabyte-seconds taken by all reduce tasks=284493824
>>> Map-Reduce Framework
>>> Map input records=7
>>> Map output records=35
>>> Map output bytes=550
>>> Map output materialized bytes=662
>>> Input split bytes=889
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=5
>>> Reduce shuffle bytes=662
>>> Reduce input records=35
>>> Reduce output records=5
>>> Spilled Records=70
>>> Shuffled Maps =7
>>> Failed Shuffles=0
>>> Merged Map outputs=7
>>> GC time elapsed (ms)=73040
>>> CPU time spent (ms)=754470
>>> Physical memory (bytes) snapshot=2070585344
>>> Virtual memory (bytes) snapshot=7448678400
>>> Total committed heap usage (bytes)=1346895872
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=784
>>> File Output Format Counters
>>> Bytes Written=81
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO:            Date & time: Fri Nov 27 14:50:32
CET 2015
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO:        Number of files: 7
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO:      Throughput mb/sec: 30.260630125485257
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO: Average IO rate mb/sec: 31.812650680541992
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO:  IO rate std deviation: 6.862839589024802
>>> 15/11/27 14:50:32 INFO fs.TestDFSIO:     Test exec time sec: 705.559
>>> 
>>> For the read phase:
>>> 15/11/27 14:56:46 INFO mapreduce.Job: Counters: 51
>>> File System Counters
>>> FILE: Number of bytes read=637
>>> FILE: Number of bytes written=929311
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=120259085961
>>> HDFS: Number of bytes written=84
>>> HDFS: Number of read operations=38
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=2
>>> Job Counters
>>> Killed map tasks=4
>>> Launched map tasks=11
>>> Launched reduce tasks=1
>>> Data-local map tasks=9
>>> Rack-local map tasks=2
>>> Total time spent by all maps in occupied slots (ms)=2096182
>>> Total time spent by all reduces in occupied slots (ms)=272737
>>> Total time spent by all map tasks (ms)=2096182
>>> Total time spent by all reduce tasks (ms)=272737
>>> Total vcore-seconds taken by all map tasks=2096182
>>> Total vcore-seconds taken by all reduce tasks=272737
>>> Total megabyte-seconds taken by all map tasks=2146490368
>>> Total megabyte-seconds taken by all reduce tasks=279282688
>>> Map-Reduce Framework
>>> Map input records=7
>>> Map output records=35
>>> Map output bytes=561
>>> Map output materialized bytes=673
>>> Input split bytes=889
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=5
>>> Reduce shuffle bytes=673
>>> Reduce input records=35
>>> Reduce output records=5
>>> Spilled Records=70
>>> Shuffled Maps =7
>>> Failed Shuffles=0
>>> Merged Map outputs=7
>>> GC time elapsed (ms)=1407
>>> CPU time spent (ms)=96580
>>> Physical memory (bytes) snapshot=1945346048
>>> Virtual memory (bytes) snapshot=7420182528
>>> Total committed heap usage (bytes)=1421869056
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=784
>>> File Output Format Counters
>>> Bytes Written=84
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO:            Date & time: Fri Nov 27 14:56:46
CET 2015
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO:        Number of files: 7
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO: Total MBytes processed: 114688.0
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO:      Throughput mb/sec: 101.66069077999725
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO: Average IO rate mb/sec: 153.29443359375
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO:  IO rate std deviation: 115.23795112362669
>>> 15/11/27 14:56:46 INFO fs.TestDFSIO:     Test exec time sec: 370.396
>>> 
>>> I usually average over a number of runs, but this is just a single one. I have
seen much greater execution times for both the write and the read phases (up to 2x the ones
stated above). I have compared the performance to XtreemFS, another distributed file system,
and it achieves an average write rate that is around 6x faster, and an average read rate that
is around 2.5x faster, however with a much lower standard deviation rate. Note that HDFS outperforms
XtreemFS in terms of TestDFSIO benchmark results with smaller files (1G). I also don' understand
why map tasks are being killed.
>>> 
>>> I noticed that quite some time is spent on GC, so I plotted some system metrics
for one node running a DataNode/NodeManager (the write phase stops and the read phase starts
at around 14:51):
>>> 
>>> <datanode.png>
>>> 
>>> I am wondering why the system RAM usage goes all the way up to the Yarn container
limit of around 60G, when only one 16G file is being written to/read from on this node.‚Äč
>>> 
>>> So, to sum up a list of questions that I would like to ask your advice about:
>>> 
>>> - Has anyone observed similar behavior/performance with HDFS and large files?
>>> - Which configuration parameters would I need to tweak in order to increase performance,
or which system properties do I need to examine for you to give me helpful advice here?
>>> - Why is the RAM consumption so high?
>>> - Why is the reported standard deviation during read phases so high?
>>> 
>>> Thanks in advance for your time and effort!
>>> 
>>> Robert
>>> 
>>> -- 
>>> My GPG Key ID: 336E2680
> 
> 
> 
> -- 
> My GPG Key ID: 336E2680
> <datanode.png>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org

Mime
View raw message