hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SammiChen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11072) Add ability to unset and change directory EC policy
Date Wed, 07 Dec 2016 12:12:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728618#comment-15728618

SammiChen commented on HDFS-11072:

Andrew, thanks very much for taking time review the patch!
bq. Can we just say "replication" rather than "continuous replicate"? e.g. "getReplicationPolicy"
instead of "getContinuousReplicatePolicy" "continuous replicate" is chosen because I thought
there is the combination of "replication" plus "erasure coding", the planed phase 2 of erasure
coding. So I'm use "continuous replicate" to distinguish future "erasure coding replicate".
Does it make sense? 

bq. Note that setting a "replication" EC policy is still different from unsetting. Unsetting
means the policy will be inherited from an ancestor. Setting a "replication" policy means
the "replication" policy will be used. Imagine a situation where there are "/a" has RS 6,3
set and "/a/b" has XOR 2,1 set. On "/a/b", unsetting vs. setting "replication" will have different
effects. So we also need an unset API, similar to the unset storage policy API.

I agree with you and the implementation matches your thoughts. And I will add a new unset

bq. Do the parameters "1-2-64K" have any meaning? If not, we should explain that they are
meaningless, or hide the parameters so we don't need to talk about them.

"1-2-64K" is auto generated from the schema when replicate policy is defined. The data is
meaningless. At the first, I use the "null" as schema to define the policy, then I found there
is checker about schema can't be null. And then I use schema (0-0-0). It breaks other checkers.
I think we would like to keep these checkers to avoid mistakes made by real ec policy, so
at the end, I choose "1-2-64k", which means 1 data block, 2 parity blocks, kind of matching
the default 3 replication case.  As Rakesh has suggested to add a new unset API and a new
unset policy sub command in "erasurecode", makes the replicate policy internal. So user will
not see the policy unless they read the source code.  

I will take care of all other comments in the new patch. 


> Add ability to unset and change directory EC policy
> ---------------------------------------------------
>                 Key: HDFS-11072
>                 URL: https://issues.apache.org/jira/browse/HDFS-11072
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, HDFS-11072-v3.patch, HDFS-11072-v4.patch
> Since the directory-level EC policy simply applies to files at create time, it makes
sense to make it more similar to storage policies and allow changing and unsetting the policy.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message