hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
Date Thu, 09 Aug 2012 13:39:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431802#comment-13431802
] 

Aaron T. Myers commented on HDFS-3672:
--------------------------------------

bq. Why is this API marked @InterfaceAudience.Public. I think we should remove it and just
leave InterfaceStability.Unstable

I was under the impression that all public classes needed to have an @InterfaceAudience annotation,
and all public classes needed to have an @InterfaceStability annotation unless they're marked
@InterfaceAudience.Private. Am I wrong about that?

bq. Configuration to turn off this functionlity should be on the server side also. Otherwise
a client can just enable this functionlality without the admin having control over it.

I thought about this a fair bit while reviewing the code. The conclusion that I came to is
that the stated reason that Arun wanted this feature disabled by default was "so that people
who use this understand that this isn't necessarily supported." A client-side-only config
seems to serve that purpose. Making this config server side as well only serves to require
the admin enable the config and restart their cluster before some client that wants to try
to use this functionality can give it a shot. That seems to me to be a strictly unnecessary
pain for both the admin and user that doesn't seem to further Arun's stated goal. For that
matter, why would an admin want to prevent clients from calling this API? If you insist on
having a server side config for this, I'd like to suggest having two separate configs: a server-side
one that defaults to enabled, but so that an admin may consciously disable it, and a client-side
config that defaults to disabled so that users of this API must consciously configure their
client, to support Arun's stated goal of making sure people are aware that it's an experimental
API.
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch,
hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch,
hdfs-3672-8.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows clients to make
scheduling decisions for locality and load balancing. Extending this to also expose on which
disk on a datanode a block resides would enable even better scheduling, on a per-disk rather
than coarse per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but also involve
a series of RPCs to the responsible datanodes to determine disk ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message