hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoy Antony (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7466) Allow different values for dfs.datanode.balance.max.concurrent.moves per datanode
Date Tue, 02 Dec 2014 19:30:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231969#comment-14231969
] 

Benoy Antony edited comment on HDFS-7466 at 12/2/14 7:29 PM:
-------------------------------------------------------------

Two ways to do this :

Approach 1:  mover/balancer to query each data node to obtain this value. This can be done
via the http port and invoking the "conf" servlet on the datanode . This has the drawback
that the mover/balancer needs to contact each of the data nodes. This implementation can be
done as plugin and the default implementation could be to read from the local configuration.
If a cluster doesn't need this accuracy, 

Approach 2: mover/balancer obtains this value from the name node. But this approach has the
drawback that this value needs to be sent in the dn heartbeat and name node has to keep track
of it.  This seems to be an overkill for a value which will be same in most clusters.  Also
this value is useful only for balancer/mover. 

I am planning to implement Approach 1 as mover/balancer already communicates with all the
data nodes to schedule the move operations.


was (Author: benoyantony):
Two ways to do this :

Approach 1:  mover/balancer to query each data node to obtain this value. This can be done
via the http port and invoking the "conf" servlet on the datanode . This has the drawback
that the mover/balancer needs to contact each of the data nodes. This implementation can be
done as plugin and the default implementation could be to read from the local configuration.
If a cluster doesn't need this accuracy, 

Approach 2: mover/balancer obtains this value from the name node. But this approach has the
drawback that this value needs to be sent in the heartbeat and name node has to keep track
of it.  This seems to be an overkill for a value which will be same in most clusters.  Also
this value is useful only for balancer/mover. 

I am planning to implement Approach 1 as mover/balancer already communicates with all the
data nodes to schedule the move operations.

> Allow different values for dfs.datanode.balance.max.concurrent.moves per datanode
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-7466
>                 URL: https://issues.apache.org/jira/browse/HDFS-7466
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer & mover
>    Affects Versions: 2.6.0
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>
> It is possible to configure different values for  _dfs.datanode.balance.max.concurrent.moves_
per datanode.  But the value will be used by balancer/mover which obtains the value from its
own configuration. 
> The correct approach will be to obtain the value from the datanode itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message