Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 882DA200C54 for ; Wed, 29 Mar 2017 03:34:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 84E26160B89; Wed, 29 Mar 2017 01:34:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C93DD160B9B for ; Wed, 29 Mar 2017 03:34:45 +0200 (CEST) Received: (qmail 43779 invoked by uid 500); 29 Mar 2017 01:34:45 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 43766 invoked by uid 99); 29 Mar 2017 01:34:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Mar 2017 01:34:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 75368C0C68 for ; Wed, 29 Mar 2017 01:34:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IyfqUTWIUzYX for ; Wed, 29 Mar 2017 01:34:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6C63A5FDB1 for ; Wed, 29 Mar 2017 01:34:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A7EA2E0B21 for ; Wed, 29 Mar 2017 01:34:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 11A5125CF2 for ; Wed, 29 Mar 2017 01:34:42 +0000 (UTC) Date: Wed, 29 Mar 2017 01:34:42 +0000 (UTC) From: "Kai Zheng (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-13200) Seeking a better approach allowing to customize and configure erasure coders MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 29 Mar 2017 01:34:46 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946376#comment-15946376 ] Kai Zheng commented on HADOOP-13200: ------------------------------------ Thanks Andrew for the nice thoughts. bq. I saw in a comment that the raw coders are stateless ... Sorry I think this should be clarified a little bit and it's partly true: 1) When I said it's stateless, I meant given a coder instance, the encode/decode call is stateless, that is to say, for a certain group of data to encode/decode, we can have concurrent threads to call encode/decode so to speed up. 2) The coder instance has states, specific to schema, configuration (and erasure parameters for decoder). The biggest concern to merge encoder and decoder together is, the states can be mixed and incur some runtime overhead. For example, both encoder and decoder can maintain big coding matrix in core cache; when merged, the memory consumption can be double (yes it can be avoided by different code path but then complicated). In HDFS side, a coder instance just serves one role, either encode or decode, not both the same time. bq. Maybe this is what Colin wanted, since the factory classes look trivial by themselves. I guess you're right, that's most likely what Colin thought. The factory classes are trivial right now, but I think it can evolve to make more sense: 1) In HADOOP-13665 I think we can do the fallback thing here elegantly, where encoder/decoder creating can fail and it can try next one configured in a list; 2) Quite some time ago when I played with micor benchmarking of the coders, I found cache coder instance can help in performance, and it's good to do it here than elsewhere like in {{CodecUtil}}. bq. I experimented by putting the NativeRS raw encoder and raw decoder into their Factory class, and it looks okay since they're pretty small. It's a very interesting try. Yes the native RS encoder/decoder are small, for other coders they may not. I thought coders and coder factories may evolve to be bigger in future, for coders if we want to support incremental encoding/decoding, then more codes will be added. As HH codec indicates, if any party supports complex codec, the encode/decode logic can be much complex. bq. We also should rename RSRawEncoderLegacy to RSLegacyRawEncoder ... Right, agree. bq. it seems like we should be creating via the appropriate factory whenever possible. Can't agree any more. bq. Overall though I think the current system is okay. The factory is the single entry point to configuring a RawCoder. I'm glad the current system works for you. Do you think it's good to fix above problems for this issue? Thanks! > Seeking a better approach allowing to customize and configure erasure coders > ---------------------------------------------------------------------------- > > Key: HADOOP-13200 > URL: https://issues.apache.org/jira/browse/HADOOP-13200 > Project: Hadoop Common > Issue Type: Sub-task > Reporter: Kai Zheng > Assignee: Kai Zheng > Priority: Blocker > Labels: hdfs-ec-3.0-must-do > > This is a follow-on task for HADOOP-13010 as discussed over there. There may be some better approach allowing to customize and configure erasure coders than the current having raw coder factory, as [~cmccabe] suggested. Will copy the relevant comments here to continue the discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org