hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
Date Mon, 06 Aug 2012 19:09:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354
] 

Aaron T. Myers commented on HDFS-3672:
--------------------------------------

Patch looks pretty good to me. A few comments:

# In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation
downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException.
# The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases.
I recommend you break this up into a few separate helper methods, e.g. one or two to initialize
the data structures, one or two to perform the RPCs, one to re-associate the DN results with
the correct block, etc.
# Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable
a static inner class.
# The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not very helpful,
since when the javadocs are generated the type of the parameter will automatically be included.
# The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param
and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations.
# In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists
in the FileSystem class as well, and say "this is more helpful with DFS", which is the only
implementation.
# I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but
keep the Unstable InterfaceStability annotations.
# Put a single space around your operators, e.g. "for (int i=0; i<blocks.size(); i++)"
# Unless I'm missing something, I don't think I see the ability to disable this feature, let
alone it being off by default, as Arun requested.
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch,
hdfs-3672-4.patch, hdfs-3672-5.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows clients to make
scheduling decisions for locality and load balancing. Extending this to also expose on which
disk on a datanode a block resides would enable even better scheduling, on a per-disk rather
than coarse per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but also involve
a series of RPCs to the responsible datanodes to determine disk ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message