Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA3237E64 for ; Thu, 28 Jul 2011 18:26:32 +0000 (UTC) Received: (qmail 32169 invoked by uid 500); 28 Jul 2011 18:26:32 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 32063 invoked by uid 500); 28 Jul 2011 18:26:31 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32049 invoked by uid 99); 28 Jul 2011 18:26:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jul 2011 18:26:31 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jul 2011 18:26:30 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8C846907F9 for ; Thu, 28 Jul 2011 18:26:09 +0000 (UTC) Date: Thu, 28 Jul 2011 18:26:09 +0000 (UTC) From: "Eric Payne (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <718042741.16222.1311877569571.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <477676474.8288.1311698229529.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2202) Changes to balancer bandwidth should not require datanode restart. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072468#comment-13072468 ] Eric Payne commented on HDFS-2202: ---------------------------------- Hi Nicholas, Thank you for reviewing this Jira. Your comments were clear, precise, and easily understood. I appreciate that. > Hi Eric, sorry that the refactoring breaks your patch. Could you update it? Yes. It has been updated. > In TestBalancerBandwidth, you may call MiniDFSCluster.getFileSystem() instead of creating a DFSClient. Done. > We should update ClientProtocol.versionID and DatanodeProtocol.versionID. > I think the BalancerBandwidthCommand.version is not needed. We have to change the DatanodeProtocol.versionID in this case. I did this in the 0.23.0 patch. However, one of the requirements for the 0.20.205.0 patch was to not modify the DatanodeProtocol.versionID (please see https://issues.apache.org/jira/browse/HDFS-2171?focusedCommentId=13068990&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13068990). The reason is that the operations team does not want to require all clusters in a colo to be upgraded for 0.20.205, which would have to be done if the DatanodeProtocol.versionID changed. This is because there are some cross-cluster use cases. In 0.20.205, I left the BalancerBandwidthCommand.version. In the case of 0.23, the DatanodeProtocol.versionID has to change anyway, so it makes sense there. > You may use for-each statement for the following (... foreach example code here...) Done > The initial capacity does not really matter. How about removing it? Done > Please add getter/setter and do not use public field DatanodeDescriptor.bandwidth. Done > Please add javadoc (or change comments to javadoc) to all new public classes/methods/fields. Done > Changes to balancer bandwidth should not require datanode restart. > ------------------------------------------------------------------ > > Key: HDFS-2202 > URL: https://issues.apache.org/jira/browse/HDFS-2202 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer, data-node > Affects Versions: 0.20.205.0, 0.23.0 > Reporter: Eric Payne > Assignee: Eric Payne > Fix For: 0.20.205.0, 0.23.0 > > Attachments: HDFS-2171.patch, HDFS-2202.0.20.205.0.v1.patch, HDFS-2202.0.23.0.v1.patch, HDFS-2202.patch > > > Currently in order to change the value of the balancer bandwidth (dfs.datanode.balance.bandwidthPerSec), the datanode daemon must be restarted. > The optimal value of the bandwidthPerSec parameter is not always (almost never) known at the time of cluster startup, but only once a new node is placed in the cluster and balancing is begun. If the balancing is taking too long (bandwidthPerSec is too low) or the balancing is taking up too much bandwidth (bandwidthPerSec is too high), the cluster must go into a "maintenance window" where it is unusable while all of the datanodes are bounced. In large clusters of thousands of nodes, this can be a real maintenance problem because these "mainenance windows" can take a long time and there may have to be several of them while the bandwidthPerSec is experimented with and tuned. > A possible solution to this problem would be to add a -bandwidth parameter to the balancer tool. If bandwidth is supplied, pass the value to the datanodes via the OP_REPLACE_BLOCK and OP_COPY_BLOCK DataTransferProtocol requests. This would make it necessary, however, to change the DataTransferProtocol version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira