hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SammiChen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11082) Erasure Coding : Provide replicated EC policy to just replicating the files
Date Fri, 04 Aug 2017 02:34:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113829#comment-16113829

SammiChen commented on HDFS-11082:

Thanks [~andrew.wang] for the quick review! I just realized that document is not updated,
will update it later. 
Also need to think about the behavior of getErasureCodingPolicy. Right now it returns "null"
to mean replication. With this patch, a user would have to check both for "null" and "replication-1-2-64K"
to know if it's replicated. It'd be good to choose one or the other to make it simpler for
downstreams. "null" would be more compatible, and it'd hide the special replicated EC policy
from non-admin users which I like.
Currently, replication policy can only be set on directory, not the file. Because currently
in file header format, replication factor and ec policy ID share the same bits. So a file
can be either traditional replication or effective EC, cannot have replication EC policy.

For getErasureCodingPolicy on directory, return "null" or "replication-1-2-64k", both have
pros and cons.  If return "null" for replication EC policy,
Pros:  1. It's easy for downstream applications to check it is effectively EC or replication
Cons: 1. after set replication EC policy on directory, it cannot be get back, so there is
no way to unset the policy or aware of the policy from user's point of view.  User cannot
distinguish a traditional replication directory and an replication EC policy directory. 
If return "replication-1-2-64k", the pros and cons are reversed.  So it's a style choice,
one is give all information to user and let them decide, another is handle it internally on
behalf of user. 
I'm prone to give all information to user. But I'm OK to go "null" solution if it's for sure
will add more benefit to users. I think you have more experience on this. You make the call.

This is not directly related (and I think we discussed this a bit on another JIRA) but I'm
not happy with our getECPolicy API right now. Right now it returns the effective EC policy.
Without being able to query the actual EC policy, the behavior when setting/unsetting is kind
of tricky. Should we add an "getActualECPolicy" API? Can be a follow-on JIRA.
Do you refer to {{getErasureCodingPolicy}} when you say {{getECPolicy}}?  I'm kind of forget
when we have discussed this issue. Can you give more hints? 

The suggestions in all other comments will be addressed in next patch. 

> Erasure Coding : Provide replicated EC policy to just replicating the files
> ---------------------------------------------------------------------------
>                 Key: HDFS-11082
>                 URL: https://issues.apache.org/jira/browse/HDFS-11082
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>            Reporter: Rakesh R
>            Assignee: SammiChen
>            Priority: Critical
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-11082.001.patch
> The idea of this jira is to provide a new {{replicated EC policy}} so that we can override
the EC policy on a parent directory and go back to just replicating the files based on replication
> Thanks [~andrew.wang] for the [discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message