hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
Date Fri, 31 Jul 2015 07:17:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648858#comment-14648858
] 

Zhe Zhang commented on HDFS-8833:
---------------------------------

Thanks for the discussions guys!

[~walter.k.su] Good catch that we are still storing EC policy at directory level. However,
a directory is no longer a zone, based on the expected properties of a _zone_, as Nicholas
[summarized | https://issues.apache.org/jira/browse/HDFS-8833?focusedCommentId=14648073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14648073].
I'll update the JIRA summary soon.

I like the hybrid solution Andrew proposed. Looks like a good long term solution. [~vinayrpet]
Let me know if it addresses the memory overhead concern you commented on.

bq. What is the semantic of moving a file under EC zone A to EC zone B? Would the file be
changed from EC scheme A to EC schema B? If yes, we could eliminate EC zones. Otherwise, we
should keep EC zone.
Thanks for the example Nicholas. Under the scope of this JIRA, the file's EC policy won't
be changed. If it was created under EC zone A it will carry EC policy A with it when being
moved. Could you explain a bit more why "If yes, we could eliminate EC zones. Otherwise, we
should keep EC zone."? 

As a follow-on we could enable an "inherit" mode similar as StoragePolicy.



> Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>
> We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the discussion under
HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and nested configuration.
Those limitations are valid in encryption for security reasons and it doesn't make sense to
carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity,
we should first implement it as an xattr and consider memory optimizations (such as moving
it to file header) as a follow-on. We should also disable changing EC policy on a non-empty
file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message