hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
Date Thu, 30 Jul 2015 05:20:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647202#comment-14647202
] 

Zhe Zhang commented on HDFS-8833:
---------------------------------

Thanks Walter for the summary and comments. If we generalize the discussion a bit, {{replication
factor}}, {{EC policy}} and {{storage-type policy}} all have a *directory - file - block*
inheritance issue. The policy defined at directory level is only the preferred default policy
for all its files and sub-dirs, similar for a file and its blocks.

[~jingzhao] It seems your concern on admin complexity is mostly on *A)* creating file with
given EC policy, and *B)* setting EC policy on a non-empty file? I totally agree that *A)*
requires a non-trivial change to file creation APIs and *B)* requires a sophisticated conversion
protocol as [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14358145&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14358145]
earlier. That's why I excluded them from the scope of this JIRA.

Are you OK with lifting the restrictions on nested policies (a file or dir having different
policy as its parent) and renaming? If so, maybe we can proceed with this JIRA and discuss
the wider scope separately? Per above discussion, when calling {{getStoragePolicy}} or {{ls}},
an admin shouldn't expect the returned {{storage policy}} and {{replication factor}} of a
directory is the same for all its descendants anyway. So nested EC policies shouldn't be breaking
admins' expectations.

> Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>
> We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the discussion under
HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and nested configuration.
Those limitations are valid in encryption for security reasons and it doesn't make sense to
carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity,
we should first implement it as an xattr and consider memory optimizations (such as moving
it to file header) as a follow-on. We should also disable changing EC policy on a non-empty
file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message