hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SammiChen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10531) Add EC policy and storage policy related usage summarization function to dfs du command
Date Wed, 05 Apr 2017 07:33:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956441#comment-15956441
] 

SammiChen commented on HDFS-10531:
----------------------------------

bq. If instead we want some way of printing out the EC policy set on paths, that seems like
it belongs in {{ls}} or even {{hdfs erasurecode}}.
Agree. I will fire a new JIRA to improve {{ls}} to show EC policy as long as the replication
factor. Will leverage current "replication factor" column to show the EC policy name.  
{quote}
 hdfs dfs -ls /
Found 5 items
-rw-r--r--   3 root supergroup       1366 2017-03-15 16:51 /README.txt
drwxr-xr-x   - root supergroup          0 2017-03-16 15:54 /benchmarks
drwxr-xr-x   - root supergroup          0 2017-04-05 14:10 /home
drwxr-xr-x   - root supergroup          0 2017-03-16 16:16 /system
drwx------   - root supergroup          0 2017-03-07 14:08 /tmp
{quote}
>From user's point of view, put the function in "ls" is better than put in "ec" function.
Because "ls" has already has the column to show file replication factor. EC is one of file
replication scheme. So it's natural to show file's EC policy here. However it will make the
"ec -getPolicy" sub-function a little bit redundant. 

As for this JIRA, since EC file is no different from 3-way replication file from quotation
point of view,  it's not clear user can benefit what from knowing how many quotas used by
each type of EC policy. So I will not recommend add "EC" information in "hdfs dfs -count"
command. 
Cluster wide stats is helpful. And if consider multi-tenant cluster environment, per directory
stats will also be helpful. So have EC policy summary in "du" command can help user. 

[~andrew.wang], what do you think? 




> Add EC policy and storage policy related usage summarization function to dfs du command
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-10531
>                 URL: https://issues.apache.org/jira/browse/HDFS-10531
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Rui Gao
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-10531.001.patch
>
>
> Currently du command output:
> {code}
>         [ ~]$ hdfs dfs -du  -h /home/rgao/
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         100 M  /home/rgao/ds
>         250 M  /home/rgao/ds-2
>         200 M  /home/rgao/noECBackup-ds
>         500 M  /home/rgao/noECBackup-ds-2
> {code}
> For hdfs users and administrators, EC policy and storage policy related usage summarization
would be very helpful when managing storages of cluster. The imitate output of du could be
like the following.
> {code}
>         [ ~]$ hdfs dfs -du  -h -t( total, parameter to be added) /home/rgao
>          
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         [Archive] [EC:RS-DEFAULT-6-3-64k] 100 M  /home/rgao/ds
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M  /home/rgao/ds-2
>         [DISK] [Replica]     200 M  /home/rgao/noECBackup-ds
>         [DISK] [Replica]     500 M  /home/rgao/noECBackup-ds-2
>          
>         Total:
>          
>         [Archive][EC:RS-DEFAULT-6-3-64k]  100 M
>         [Archive][Replica]                                0 M
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M
>         [DISK] [Replica]                               700 M  
>      
>         [Archive][ALL]                                 100M
>         [DISK]    [ALL]                                  950M
>         [ALL]     [EC:RS-DEFAULT-6-3-64k]    350M
>         [ALL]     [Replica]                              700M
> {code}     



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message