hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
Date Mon, 06 Aug 2012 19:09:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354

Aaron T. Myers commented on HDFS-3672:

Patch looks pretty good to me. A few comments:

# In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation
downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException.
# The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases.
I recommend you break this up into a few separate helper methods, e.g. one or two to initialize
the data structures, one or two to perform the RPCs, one to re-associate the DN results with
the correct block, etc.
# Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable
a static inner class.
# The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not very helpful,
since when the javadocs are generated the type of the parameter will automatically be included.
# The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param
and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations.
# In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists
in the FileSystem class as well, and say "this is more helpful with DFS", which is the only
# I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but
keep the Unstable InterfaceStability annotations.
# Put a single space around your operators, e.g. "for (int i=0; i<blocks.size(); i++)"
# Unless I'm missing something, I don't think I see the ability to disable this feature, let
alone it being off by default, as Arun requested.
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch,
hdfs-3672-4.patch, hdfs-3672-5.patch
> Currently, HDFS exposes on which datanodes a block resides, which allows clients to make
scheduling decisions for locality and load balancing. Extending this to also expose on which
disk on a datanode a block resides would enable even better scheduling, on a per-disk rather
than coarse per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but also involve
a series of RPCs to the responsible datanodes to determine disk ids.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message