hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
Date Wed, 05 Aug 2015 18:10:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658614#comment-14658614

Jing Zhao commented on HDFS-8833:

Thanks for the discussion guys. Some thoughts here:

One thing I learnt from the storage policy is that using several bits in the INodeFile's header
to represent some policy can always become a big limitation in the end. When we allow users
to define/add new schemas, they always want to use/define more and more policies (considering
we also want to push the cell size into the policy). How to limit the total number of policies,
whether to allow them to modify or delete an existing policy when the limit is hit, and how
to change the INodeFile layout to support more policies, will all become challenges for us
(HDFS-7076 is an example).

If the main issue here is rename, I'd like to associate the ec policy to a file only during
the rename (instead of during the file creation), i.e., when this single file (not its ancestral
directory) is moved to another directory with a different ec setting. Also, instead of recording
the policy in the file header, I think we should only use xatrr. Considering most file renames
happen in the same directory, these xattr will not cost much memory.

> Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC
> -------------------------------------------------------------------------------------------
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
> We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the discussion under
> As a recap, the _zone_ concept has severe limitations including renaming and nested configuration.
Those limitations are valid in encryption for security reasons and it doesn't make sense to
carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity,
we should first implement it as an xattr and consider memory optimizations (such as moving
it to file header) as a follow-on. We should also disable changing EC policy on a non-empty
file / dir in the first phase.

This message was sent by Atlassian JIRA

View raw message