hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clay B." <...@clayb.net>
Subject Re: I/O stats interpretation during concurrent hive M/R runs
Date Mon, 27 Aug 2012 19:34:56 GMT
Hi Himanish,

I would strongly recommend you read the man-page[1] for I/O stat to get a good 
baseline understanding of what the command is telling you; understandably 
that's only so helpful. Hadoop will behave similarly to any other user-land 
application when looking only at a single process but certainly on a datanode, 
one may have many compute processes using it at any one time.

If you are only running your job you want to analyze, then you do not need to 
worry about interaction effects but you will want to see that you are looking 
at consistent phases of the map/reduce; e.g. was your snapshot during only the 
map or only the reduce phase (I/O usage will vary between the two)?

As you correctly look to, the %util column can tell you the time the machine 
spent doing I/O requests[2]. However, looking at the wait times, request queue 
is quite important[3] to understand if the machine is 100% utilized doing I/O 
-- as fast as the machine can issue requests -- or if the machine is waiting 
for I/O to complete and could be issuing more requests were there more 
spindles. Of use also, is looking at the %iowait column to see how much 
processor time was blocking on I/O requests.

I do not see in your e-mail the line providing the %util and queue size, etc. 
So, I can not comment on that.

Next, Hadoop specific I/O tuning can be of assistance. Depending on your job's 
goals, you may be able to change your HDFS block size, on-disk and intermediate 
compression algorithms and number of reducers to optimize I/O usage. There's 
plenty on all of this already available.

-Clay

[1]: Run the command "man iostat" or see:
http://linuxcommand.org/man_pages/iostat1.html
[2]: A much better breakdown explaining the %util metric
http://stackoverflow.com/questions/4458183/how-the-util-of-iostat-is-computed
[3]: A reasonable explanation of I/O wait times on a Linux machine (despite 
being for MySQL and not Hadoop it is perfectly relevant) 
http://www.dbasquare.com/2012/04/18/analyzing-io-performance/

On Thu, 23 Aug 2012, Himanish Kushary wrote:

> After sending this message I issued the iostat -dxm 5 command on the DNs....
> the %util column shows 70-80  average value sometimes going up to 90-100 for
> few seconds 
> Does this mean the disk is becoming the bottleneck ? or is this normal ?
> 
> On Thu, Aug 23, 2012 at 3:14 PM, Himanish Kushary <himanish@gmail.com>
> wrote:
>       Hi,
> I am curious about interpretation of the output from iostat on a
> datanode during a M/R run.I want to understand how to diagnosis a disk
> i/o issue in a hadoop cluster.
> 
> Is there any good documentation to help me understand the results from
> iostats in Hadoop context ?
> 
> Here are the iostat output  from a DN while two intensive M/R jobs
> were executing.Does this result indicate any performance issue related
> to the disks ?
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.80         0.00        59.20          0    
   296
> sdb            1436.20     96376.00    211424.00     481880    1057120
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           22.81    0.00   45.28    3.59    0.00   28.32
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               2.80        25.60        80.00        128    
   400
> sdb            1073.60     45891.20    203473.60     229456    1017368
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           17.28    0.00   74.49    0.32    0.00    7.92
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               3.44         0.00        83.97          0    
   440
> sdb            1174.62     52370.99    209789.31     274424    1099296
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           51.72    0.00   47.60    0.31    0.00    0.38
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.20         0.00        22.40          0    
   112
> sdb            1094.20     67492.80    177187.20     337464     885936
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           43.73    0.00   36.19    3.03    0.00   17.05
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.80         0.00        46.40          0    
   232
> sdb            1241.20    100969.60    162806.40     504848     814032
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           37.09    0.00   58.61    0.77    0.00    3.54
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               4.60       182.40        19.20        912      
  96
> sdb            1235.20     47780.80    235912.00     238904    1179560
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           47.23    0.00   42.49    3.09    0.00    7.19
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.60         0.00        46.40          0    
   232
> sdb            1005.20     86502.40    135886.40     432512     679432
> 
> 
> 
> --------------------------- 
> Thanks & Regards
> Himanish
> 
> 
> 
> 
> --
> Thanks & Regards
> Himanish
>
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message