hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
Date Thu, 07 Sep 2017 20:53:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157633#comment-16157633
] 

Kai Zheng commented on HDFS-7859:
---------------------------------

bq. Checking file / directory that is using this particular policy is a potentially O(n) operation,
where n = # of inodes. I feel that it is OK to leave it in fsimage as garbage for now. In
the future, we can let the fsimage loading process to handling this garbage, as it is O(n).
Discussed with Sammi offline before, we can do this very lightly like below:
{code}
# all used polices by files/directories
usedPoliciesSet = ();

# while loading inodes from fsimage, add the following two lines
foreach (in: inodes) {
  policyId = getPolicyIdFromInode(in) # a bitwise op, very minor
  usedPoliciesSet.add(policyId)
}

# when inodes all loaded, add the following post step
ErasureCodingPolicyManager.getInstance().updateWithUsedPolices(usedPoliciesSet)

# in ErasureCodingPolicyManager.updateWithUsedPolices, it's a simple step to clean up removed
policies with the used polices set. 
{code}

bq. Here or elsewhere, please ensure no policy can be DISABLED/REMOVED if it's used by files,
with necessary tests.
Let me correct myself. We should allow to disable/remove polices regardless they're used or
not. It would be too much overhead to track policy usages while NN is running along with lots
of files being operated. We can just do a post clean up as above illustrated.

I'm fine to leave the policies clean up work as a future work to do, but if sounds good maybe
we can get it done before 3.0 GA. It should be OK since it doesn't involve API change.

bq. Could this happen before BETA 1? it seems to be a breaking change. If not , do we have
a plan to preserve both this key and the capability of adding/removing policies?
I agree, we should get it done this time. Actually, IIRC, this was already done but Sammi
may need some double check and clean up if any.

> Erasure Coding: Persist erasure coding policies in NameNode
> -----------------------------------------------------------
>
>                 Key: HDFS-7859
>                 URL: https://issues.apache.org/jira/browse/HDFS-7859
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, HDFS-7859.005.patch,
HDFS-7859.006.patch, HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, HDFS-7859.010.patch,
HDFS-7859.011.patch, HDFS-7859.012.patch, HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch,
HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas
in NameNode centrally and reliably, so that EC zones can reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message