hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
Date Sat, 21 Jul 2012 22:28:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419962#comment-13419962

Todd Lipcon commented on HDFS-3672:

I understand the reticence to add new APIs without "proof" that they're useful. But it's a
bit of a chicken-egg situation here. It's difficult for downstream projects to build against
a branch or an uncommitted patch.

One experiment I ran that I can report on is as follows (you may remember this from the HDFS
Performance talk I gave prior to Hadoop Summit):
- Test setup: 12x2T disks on a pseudo-distributed HDFS. Write 24 files, each ~10GB to the
the local HDFS cluster.
- Read throughput test (no scheduling): Start a "hadoop fs -cat /fileN > /dev/null" for
all 24 files. Got ~700M/sec
- Read throughput test (simulated "scheduling"): Run 12 threads, one per data directory: find
/data/N -name blk\* -exec cat {} \;. Got ~900M/sec (30% improvement)

In each case, I ran "iostat -dxm 1" to collect disk stats on a 1-second interval. In the "unscheduled"
test, each sample showed about 8 disks at 100% utilization and 4 disks at 0% utilization.
In the "scheduled" test, all disks remain at 100% utilization.

While the above experiment is obviously more tightly controlled than a real workload, it does
show that you need to have scheduling to use all of the disks to their full potential.

Would a fair compromise be to mark the new API as @InterfaceAudience.Unstable so that people
understand it's experimental and may change or disappear in future releases? Given that the
use cases for it are performance enhancement only, it seems like people could simply wrap
in a try/catch so that, if the API ends up throwing an UnsupportedOperationException in a
future version, it would just fall back to the slower un-scheduled path.

> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-3672-1.patch
> Currently, HDFS exposes on which datanodes a block resides, which allows clients to make
scheduling decisions for locality and load balancing. Extending this to also expose on which
disk on a datanode a block resides would enable even better scheduling, on a per-disk rather
than coarse per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but also involve
a series of RPCs to the responsible datanodes to determine disk ids.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message