hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
Date Wed, 15 Apr 2015 00:35:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495442#comment-14495442
] 

Zhe Zhang commented on HDFS-7859:
---------------------------------

[~szetszwo] / [~drankye]: The [phasing plan | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14391207&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391207]
I posted might be a little confusing in regards of schemas. My apologies.

In the offline meetup on 03/31, we didn't reach a clear conclusion on how much of schema work
to include before merging. Therefore I left it in phase I, but marked it as optional. My thought
was that we could make a better decision after observing how fast the work could proceed.
Up to this point I think this thread is going pretty well and it seems we can have a multi-schema
implementation when other HDFS-7285 tasks are done (see details below).

Good [questions | https://issues.apache.org/jira/browse/HDFS-7859?focusedCommentId=14494933&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494933]
on schema design. I think we eventually need to answer them in the broader scope of HDFS-7337.
IIUC HDFS-7859 / HDFS-7866 are not touching most of the tricky scenarios. Based on Kai's latest
[comment | https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14494050&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494050],
HDFS-7866 will mostly handle _default_ schemas embedded in the {{ECSchemaManager}} code. 

The patch under this JIRA handles saving / loading these default schemas in fsimage. I think
this is necessary even without loading custom schemas from XML. Otherwise we cannot guarantee
the NameNode which loads the fsimage has the same default schemas as the NameNode which saved
it. It is obviously even more necessary when we add custom schemas. The logic in the patch
is quite straightforward; it's mostly about serialize / deserialize schemas.

So here's my proposal:
# Shrink this patch to get rid of logics on modifying and removing schemas ({{ECSchemaManager#modifyECSchema}}
and {{OP_MODIFY_EC_SCHEMA}}). 
# Repurpose HDFS-7866 to focus on loading custom schemas from site xml files.

[~szetszwo], [~drankye], [~vinayrpet]: let me know if you agree with the above. If we are
all synced on this, how about moving this JIRA back to HDFS-7285 and keeping HDFS-7866 under
HDFS-8031?

> Erasure Coding: Persist EC schemas in NameNode
> ----------------------------------------------
>
>                 Key: HDFS-7859
>                 URL: https://issues.apache.org/jira/browse/HDFS-7859
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Xinwei Qin 
>         Attachments: HDFS-7859.001.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas
in NameNode centrally and reliably, so that EC zones can reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message