hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10531) Add EC policy and storage policy related usage summarization function to dfs du command
Date Thu, 03 Nov 2016 03:46:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15631433#comment-15631433

Andrew Wang commented on HDFS-10531:

Thanks for working on this Wei-Chiu. I have some code review comments, but would like to start
by asking about the usecase. We already have {{hadoop fs -count -q}} which shows us usage
by storage type. If we want to do this for EC too, I'd prefer we add functionality to the
{{Count}} command.

If instead we want some way of printing out the EC policy set on paths, that seems like it
belongs in {{ls}} or even {{hdfs erasurecode}}.

Since EC doesn't relate to quota though, I think normally admins will be the ones who care
about how much data is EC, and at a cluster level. Cluster-wide stats are thus a better place
for these numbers.

Code comments:

* In ContentSummaryComputationContext, let's try to follow the recommended modifier order:
* Should we say "redundancy" rather than "replication" in the docs / help? e.g. {{disk_space_consumed_with_all_replicas}}
and {{Display storage and replication policy}}.
* Commented new code in INodeFile
* I think it'd look better to put the new columns at the end. This is less likely to break
parsing scripts too, since they normally parse left-to-right. Since this output is tab-delimited,
we also don't need brackets.
* Could you add a test for "-s" with "-t" ?

> Add EC policy and storage policy related usage summarization function to dfs du command
> ---------------------------------------------------------------------------------------
>                 Key: HDFS-10531
>                 URL: https://issues.apache.org/jira/browse/HDFS-10531
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Rui Gao
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-10531.001.patch
> Currently du command output:
> {code}
>         [ ~]$ hdfs dfs -du  -h /home/rgao/
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         100 M  /home/rgao/ds
>         250 M  /home/rgao/ds-2
>         200 M  /home/rgao/noECBackup-ds
>         500 M  /home/rgao/noECBackup-ds-2
> {code}
> For hdfs users and administrators, EC policy and storage policy related usage summarization
would be very helpful when managing storages of cluster. The imitate output of du could be
like the following.
> {code}
>         [ ~]$ hdfs dfs -du  -h -t( total, parameter to be added) /home/rgao
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         [Archive] [EC:RS-DEFAULT-6-3-64k] 100 M  /home/rgao/ds
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M  /home/rgao/ds-2
>         [DISK] [Replica]     200 M  /home/rgao/noECBackup-ds
>         [DISK] [Replica]     500 M  /home/rgao/noECBackup-ds-2
>         Total:
>         [Archive][EC:RS-DEFAULT-6-3-64k]  100 M
>         [Archive][Replica]                                0 M
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M
>         [DISK] [Replica]                               700 M  
>         [Archive][ALL]                                 100M
>         [DISK]    [ALL]                                  950M
>         [ALL]     [EC:RS-DEFAULT-6-3-64k]    350M
>         [ALL]     [Replica]                              700M
> {code}     

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message