hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
Date Sun, 12 Aug 2012 04:30:43 GMT
Jira is down, so I'll comment here....

I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException
rather than merely do this via a client-side config.


On Aug 9, 2012, at 6:39 AM, Aaron T. Myers (JIRA) wrote:

>    [ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431802#comment-13431802
> Aaron T. Myers commented on HDFS-3672:
> --------------------------------------
> bq. Why is this API marked @InterfaceAudience.Public. I think we should remove it and
just leave InterfaceStability.Unstable
> I was under the impression that all public classes needed to have an @InterfaceAudience
annotation, and all public classes needed to have an @InterfaceStability annotation unless
they're marked @InterfaceAudience.Private. Am I wrong about that?
> bq. Configuration to turn off this functionlity should be on the server side also. Otherwise
a client can just enable this functionlality without the admin having control over it.
> I thought about this a fair bit while reviewing the code. The conclusion that I came
to is that the stated reason that Arun wanted this feature disabled by default was "so that
people who use this understand that this isn't necessarily supported." A client-side-only
config seems to serve that purpose. Making this config server side as well only serves to
require the admin enable the config and restart their cluster before some client that wants
to try to use this functionality can give it a shot. That seems to me to be a strictly unnecessary
pain for both the admin and user that doesn't seem to further Arun's stated goal. For that
matter, why would an admin want to prevent clients from calling this API? If you insist on
having a server side config for this, I'd like to suggest having two separate configs: a server-side
one that defaults to enabled, but so that an admin may consciously disable it, and a client-side
config that defaults to disabled so that users of this API must consciously configure their
client, to support Arun's stated goal of making sure people are aware that it's an experimental
>> Expose disk-location information for blocks to enable better scheduling
>> -----------------------------------------------------------------------
>>                Key: HDFS-3672
>>                URL: https://issues.apache.org/jira/browse/HDFS-3672
>>            Project: Hadoop HDFS
>>         Issue Type: Improvement
>>   Affects Versions: 2.0.0-alpha
>>           Reporter: Andrew Wang
>>           Assignee: Andrew Wang
>>        Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch,
hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch,
>> Currently, HDFS exposes on which datanodes a block resides, which allows clients
to make scheduling decisions for locality and load balancing. Extending this to also expose
on which disk on a datanode a block resides would enable even better scheduling, on a per-disk
rather than coarse per-datanode basis.
>> This API would likely look similar to Filesystem#getFileBlockLocations, but also
involve a series of RPCs to the responsible datanodes to determine disk ids.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Arun C. Murthy
Hortonworks Inc.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message