hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10551) o.a.h.h.s.diskbalancer.command.Command does not actually verify options as expected.
Date Wed, 22 Jun 2016 19:10:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344990#comment-15344990

Anu Engineer commented on HDFS-10551:

 [~liuml07] Thanks for your comments.

bq. In Command constructor, // These arguments are valid for all commands. can be deleted.
I left there for the future maintainers. Earlier we had a command like -help on each command,
but now we have moved it to an independent -help command. 
So you have no reference to how to use this. I think this comment helps with someone how wants
to add an option to all commands in disk balancer. please let me know if you disagree and
we must remove this comment.

bq. In TestDiskBalancerCommand other test cases can also use the class filed conf (e.g. testReadClusterFromJson()).
This will also address the checkstyle warning I think.
Agreed, the latest patch fixed that issue and we don't have any more checkstyle warnings

bq. For the config key name dfs.disk.balancer.max.disk.throughputInMBperSec, I'd suggest dfs.disk.balancer.max.disk.bandwidthPerSec.
I think throughput and bandwidth have different meanings in different context and bandwidth
here is better. This also conforms to existing names like dfs.image.transfer.bandwidthPerSec,
dfs.balance.bandwidthPerSec and dfs.datanode.balance.bandwidthPerSec, to name a few.

if my memory serves me correctly it was named something like that because a code review comment
wanted to make sure that Units are expressed as part of the name. Names like bandwithPerSec
does not tell you if it is bytes or MB or GB that is measured. Also this is setting is exposed
to users via --bandwidth on command line. 
My understanding about the nomenclature -- Not a native english language speaker -- Bandwidth
is the maximum channel capacity, throughput represents the speed at which things move through
a system. if my understanding is correct, then the current naming seems to be more apt. I
do see that we have used bandwidthPerSec without units in earlier config properties. But I
am conflicted whether we should follow that pattern or use one which is more explicit.

if you feel strongly about this please file a JIRA and I will make a follow up fix.

> o.a.h.h.s.diskbalancer.command.Command does not actually verify options as expected.
> ------------------------------------------------------------------------------------
>                 Key: HDFS-10551
>                 URL: https://issues.apache.org/jira/browse/HDFS-10551
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Lei (Eddy) Xu
>            Assignee: Anu Engineer
>            Priority: Critical
>         Attachments: HDFS-10551-HDFS-1312.001.patch, HDFS-10551-HDFS-1312.002.patch,
> In {{diskbalancer.command.Command#verifyCommandOptions}}. The following code does not
do what it expected to do:
> {code}
> if (!validArgs.containsKey(opt.getArgName())) {
> {code}
> opt.getArgName() always returns "arg" instead of i.e., {{report}} or {{uri}}, which is
the expected parameter to check.
> It should use {{opt.getLongOpt()}} to get the option names. It can pass on the branch
because {{opt.getArgName()}} always returns {{"arg"}}, which is accidently in {{validArgs}}.
However I don't think it is the intention for this function.
> Additionally, in the following code
> {code}
> validArguments.append("Valid arguments are : %n");
> {code}
> This {{%n}} is not used.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message