Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 47673200CED for ; Fri, 4 Aug 2017 04:34:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4633E16CF40; Fri, 4 Aug 2017 02:34:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8B28116CF3E for ; Fri, 4 Aug 2017 04:34:05 +0200 (CEST) Received: (qmail 54110 invoked by uid 500); 4 Aug 2017 02:34:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 54099 invoked by uid 99); 4 Aug 2017 02:34:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2017 02:34:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 326ED1A0A84 for ; Fri, 4 Aug 2017 02:34:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id HGcRonPVXBI8 for ; Fri, 4 Aug 2017 02:34:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1FD8E5F6C1 for ; Fri, 4 Aug 2017 02:34:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 159C5E0DDB for ; Fri, 4 Aug 2017 02:34:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 269FB24654 for ; Fri, 4 Aug 2017 02:34:00 +0000 (UTC) Date: Fri, 4 Aug 2017 02:34:00 +0000 (UTC) From: "SammiChen (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11082) Erasure Coding : Provide replicated EC policy to just replicating the files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 04 Aug 2017 02:34:06 -0000 [ https://issues.apache.org/jira/browse/HDFS-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113829#comment-16113829 ] SammiChen commented on HDFS-11082: ---------------------------------- Thanks [~andrew.wang] for the quick review! I just realized that document is not updated, will update it later. {quote} Also need to think about the behavior of getErasureCodingPolicy. Right now it returns "null" to mean replication. With this patch, a user would have to check both for "null" and "replication-1-2-64K" to know if it's replicated. It'd be good to choose one or the other to make it simpler for downstreams. "null" would be more compatible, and it'd hide the special replicated EC policy from non-admin users which I like. {quote} Currently, replication policy can only be set on directory, not the file. Because currently in file header format, replication factor and ec policy ID share the same bits. So a file can be either traditional replication or effective EC, cannot have replication EC policy. For getErasureCodingPolicy on directory, return "null" or "replication-1-2-64k", both have pros and cons. If return "null" for replication EC policy, Pros: 1. It's easy for downstream applications to check it is effectively EC or replication Cons: 1. after set replication EC policy on directory, it cannot be get back, so there is no way to unset the policy or aware of the policy from user's point of view. User cannot distinguish a traditional replication directory and an replication EC policy directory. If return "replication-1-2-64k", the pros and cons are reversed. So it's a style choice, one is give all information to user and let them decide, another is handle it internally on behalf of user. I'm prone to give all information to user. But I'm OK to go "null" solution if it's for sure will add more benefit to users. I think you have more experience on this. You make the call. {quote} This is not directly related (and I think we discussed this a bit on another JIRA) but I'm not happy with our getECPolicy API right now. Right now it returns the effective EC policy. Without being able to query the actual EC policy, the behavior when setting/unsetting is kind of tricky. Should we add an "getActualECPolicy" API? Can be a follow-on JIRA. {quote} Do you refer to {{getErasureCodingPolicy}} when you say {{getECPolicy}}? I'm kind of forget when we have discussed this issue. Can you give more hints? The suggestions in all other comments will be addressed in next patch. > Erasure Coding : Provide replicated EC policy to just replicating the files > --------------------------------------------------------------------------- > > Key: HDFS-11082 > URL: https://issues.apache.org/jira/browse/HDFS-11082 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding > Reporter: Rakesh R > Assignee: SammiChen > Priority: Critical > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11082.001.patch > > > The idea of this jira is to provide a new {{replicated EC policy}} so that we can override the EC policy on a parent directory and go back to just replicating the files based on replication factors. > Thanks [~andrew.wang] for the [discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org