hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
Date Thu, 30 Jul 2015 00:55:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646996#comment-14646996

Walter Su commented on HDFS-8833:

Three places to store schema and cellSize: {{zone level}}, {{file level}}, {{block level}}.
I think file level is not a bad idea.

We can have an abstraction for {{schema + cellSize}} ([link|https://issues.apache.org/jira/browse/HDFS-8059?focusedCommentId=14630027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14630027]),
we call it {{ECPolicy}}. We give each {{ECPolicy}} an ID. Assume at most 16 types. We store
4 bits in file header.

bq. EC schema / cellSize are information at the storage layer. ([Link|https://issues.apache.org/jira/browse/HDFS-8059?focusedCommentId=14630433&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14630433])
I agree with [~wheat9]. So we need 4 bits from HDFS-8823 to store the {{ECPolicy}} ID.

bq. The proposed change merely moves EC policy from zone XAttr to file XAttr, which is a change
within namespace.
Yes, the {{ECPolicy}} should stored in namespace. Also it should be copied(tracked) at the
storage layer. 4 bits from HDFS-8823 can solve that.

bq. Also in general pushing ec schema to file creation API level will make the whole management
work extremely hard for admin.
I understand your concern. But we need think a big picture. {{file level}} has less limitation.
One reason is as [~andrew.wang] said. [link|https://issues.apache.org/jira/browse/HDFS-8059?focusedCommentId=14630418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14630418].(Trash
feature can be easily support.) The other reason is we can convert non-ec file to ec file
in place.

> Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
> -----------------------------------------------------------------------------------
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
> We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the discussion under
> As a recap, the _zone_ concept has severe limitations including renaming and nested configuration.
Those limitations are valid in encryption for security reasons and it doesn't make sense to
carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity,
we should first implement it as an xattr and consider memory optimizations (such as moving
it to file header) as a follow-on. We should also disable changing EC policy on a non-empty
file / dir in the first phase.

This message was sent by Atlassian JIRA

View raw message