hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
Date Wed, 05 Aug 2015 22:55:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659128#comment-14659128
] 

Andrew Wang commented on HDFS-8833:
-----------------------------------

Good question, I think it basically comes down to our deployment scenarios being more broad
than Quantcast or Facebook. You want a # of racks equal to the stripe width for fault tolerance.
FB and Quantcast are big enough that they run 14 rack or 9 rack clusters, but not all of our
customers are at that same scale. So there isn't a one-size-fits-all schema that works for
all HDFS users; the big ones will use (10,4) or (6,3) like FB and Quantcast, but the smaller
ones will want (3,2).

I've also seen customers starting with small clusters and growing them by adding racks over
time. This is also somewhat unique to HDFS compared to QFS and f4, and a reason why it'd be
nice to support a few different policies even within the same cluster.

> Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC
zones
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>
> We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the discussion under
HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and nested configuration.
Those limitations are valid in encryption for security reasons and it doesn't make sense to
carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity,
we should first implement it as an xattr and consider memory optimizations (such as moving
it to file header) as a follow-on. We should also disable changing EC policy on a non-empty
file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message