hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5446) Consider supporting a mechanism to allow datanodes to drain outstanding work during rolling upgrade
Date Thu, 13 Feb 2014 13:50:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900344#comment-13900344
] 

Kihwal Lee commented on HDFS-5446:
----------------------------------

After the OOB acking feature, I believe we can make DN tell writers to move out more easily.
Although this is less useful for rolling upgrades, it can solve the problem of decommissioning
nodes with long slow writers. Clients will be able to migrate their writes to another node,
so even the blocks with single replica will continue to work.

> Consider supporting a mechanism to allow datanodes to drain outstanding work during rolling
upgrade
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5446
>                 URL: https://issues.apache.org/jira/browse/HDFS-5446
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: Nathan Roberts
>
> Rebuilding write pipelines is expensive and this can happen many times during a rolling
restart of datanodes (i.e. during a rolling upgrade). It seems like it might help if datanodes
could be told to drain current work while rejecting new requests - possibly with a new response
indicating the node is temporarily unavailable (it's not broken, it's just going through a
maintenance phase where it shouldn't accept new work). 
> Waiting just a few seconds is normally enough to clear up a good percentage of the open
requests without error, thus reducing the overhead associated with restarting lots of datanodes
in rapid succession.
> Obviously would need a timeout to make sure the datanode doesn't wait forever.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message