hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SammiChen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
Date Fri, 08 Sep 2017 07:44:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158252#comment-16158252

SammiChen commented on HDFS-7859:

Thanks [~eddyxu] and [~drankye] for review the patch and provide very detail suggestions!
 Could you consider to use:
message ErasureCodingPolicyManagerSection {
   repeated ErasureCodingPolicyProto policies = 1;
// dd new erasure coding policy
ECSchema newSchema = new ECSchema("rs", 5, 3);
The comments actually is "add new erasure coding policy".  It's a typo.

bq. Checking file / directory that is using this particular policy is a potentially O(n) operation,
where n = # of inodes. I feel that it is OK to leave it in fsimage as garbage for now. In
the future, we can let the fsimage loading process to handling this garbage, as it is O(n).
HDFS-12405 is created to track the permanently delete the policy from system at Namenode restart
time.  Will start to working on it after beta1. 

Regarding to the policy ID design, are there general rules for customize EC policy design?
My question is, what is the ID value range can be chosen for a customized policy. Currently
the system EC policies use values up to 5. If a customer / vender provides a new EC policy
with ID=6, when the next version of Hadoop adding a new EC policy, how do we handle the conflicts
(i.e, ID=6 has been used), in fsimage and INode. Or a customer using policies from two vendors,
who accidentally use the same IDs. SammiChen could you add some test cases like this as future
Here are the general rules for customized EC policy,
1. when user add customized EC policy, user specify codec name, data units number, parity
units number, cell size.  Policy ID and policy name are automatically generated by system.
customized EC policy ID starts from 64, atomic incremented. So generally there will not have
2 policies in the same system has the same policy ID. 
2. system built-in policy ID starts from 1 to 63.  system policy and customized policy will
have different ID range. 

Question to Kai Zheng:
I thought "dfs.namenode.ec.policies.enabled" should have been removed when adding the API
to enable/disable policy.
Could this happen before BETA 1? it seems to be a breaking change. If not , do we have a plan
to preserve both this key and the capability of adding/removing policies?
like to have inputs from [~andrew.wang]. I'm fine with the thought. 

bq. Again, could we make the change: ErasureCodingPolicyManagerSection => ErasureCodingSection.
Also check related names like loadErasureCodingPolicyManagerSection, saveErasureCodingPolicyManagerSection.
There are existing "CacheManagerSection" saves cache directives for CacheManager, "SecretManagerSection"
saves secrets for SecretManager. So its better follow the style, use "ErasureCodingPolicyManagerSection"
to save the EC policies for ErasureCodingPolicyManager. 

All other comments will be taken care in next patch. 

> Erasure Coding: Persist erasure coding policies in NameNode
> -----------------------------------------------------------
>                 Key: HDFS-7859
>                 URL: https://issues.apache.org/jira/browse/HDFS-7859
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, HDFS-7859.005.patch,
HDFS-7859.006.patch, HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, HDFS-7859.010.patch,
HDFS-7859.011.patch, HDFS-7859.012.patch, HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch,
HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas
in NameNode centrally and reliably, so that EC zones can reference them by name efficiently.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message