hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-738) Improve the disk utilization of HDFS
Date Tue, 27 Oct 2009 17:56:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770593#action_12770593
] 

Hong Tang commented on HDFS-738:
--------------------------------

I have done some empirical observation - on Linux, "iostat -dkx 10" would provide two useful
metrics: %util and avgqu-sz. %util is a pretty good indicator of disk utilization (but sometimes
it would shoot over 100%), a high %util with a large avgqu-sz (10s to 100s) means overload
on disk.  

> Improve the disk utilization of HDFS
> ------------------------------------
>
>                 Key: HDFS-738
>                 URL: https://issues.apache.org/jira/browse/HDFS-738
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Zheng Shao
>
> HDFS data node currently assigns writers to disks randomly. This is good if there are
a large number of readers/writers on a single data node, but might create a lot of contentions
if there are only 4 readers/writers on a 4-disk node.
> A better way is to introduce a base class DiskHandler, for registering all disk operations
(read/write), as well as getting the best disk for writing new blocks. A good strategy of
the DiskHandler would be to distribute the load of the writes to the disks with more free
spaces as well as less recent activities. There can be many strategies.
> This could help improve the HDFS multi-threaded write throughput a lot - we are seeing
<25MB/s/disk on a 4-disk/node 4-node cluster (replication is already considered) given
8 concurrent writers (24 writers considering replication). I believe we can improve that to
2x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message