Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 771C71723F for ; Wed, 22 Apr 2015 00:36:59 +0000 (UTC) Received: (qmail 51828 invoked by uid 500); 22 Apr 2015 00:36:59 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 51771 invoked by uid 500); 22 Apr 2015 00:36:59 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 51694 invoked by uid 99); 22 Apr 2015 00:36:59 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 00:36:59 +0000 Date: Wed, 22 Apr 2015 00:36:59 +0000 (UTC) From: "Zhe Zhang (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506111#comment-14506111 ] Zhe Zhang commented on HADOOP-11847: ------------------------------------ Thanks Kai for the patch. Please find my review below: # We try to decode all null slots in the input arrays. I'm not sure if this will cause unnecessary computation. # Could you explain this change? Shouldn't the first argument be {{numDataUnits}}? {code} - xorRawDecoder.initialize(getNumDataUnits(), 1, getChunkSize()); + xorRawDecoder.initialize(getNumDataUnits() + getNumParityUnits() - 1, + 1, getChunkSize()); {code} # {{checkParameters}} goes through the input arrays once, and the {{badCount}} makes another pass. Can we just assert {{badCount + erasedIndexes.length == numDataUnits}}? # {{ensureWhenUseXXX}} needs some Javadoc. Maybe also add a better explanation than {{// Lazy on demand}}? # These variable names look confusing: {{decodingDirectBufferInputs}} vs. {{decodingDirectBuffersForInput}}, and {{decodingDirectBufferOutputs}} vs. {{decodingDirectBuffersForOutput}} # Is {{decodingByteArrayBuffersForInput}} always filled with zero bytes? I don't see where it's filled with actual data > Enhance raw coder allowing to read least required inputs in decoding > -------------------------------------------------------------------- > > Key: HADOOP-11847 > URL: https://issues.apache.org/jira/browse/HADOOP-11847 > Project: Hadoop Common > Issue Type: Sub-task > Components: io > Reporter: Kai Zheng > Assignee: Kai Zheng > Attachments: HADOOP-11847-v1.patch > > > This is to enhance raw erasure coder to allow only reading least required inputs while decoding. It will also refine and document the relevant APIs for better understanding and usage. When using least required inputs, it may add computating overhead but will possiblly outperform overall since less network traffic and disk IO are involved. > This is something planned to do but just got reminded by [~zhz]' s question raised in HDFS-7678, also copied here: > bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should I construct the inputs to RawErasureDecoder#decode? > With this work, hopefully the answer to above question would be obvious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)