Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B3C818FB3 for ; Tue, 5 May 2015 00:27:07 +0000 (UTC) Received: (qmail 22086 invoked by uid 500); 5 May 2015 00:27:07 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 22019 invoked by uid 500); 5 May 2015 00:27:07 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 22007 invoked by uid 99); 5 May 2015 00:27:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 00:27:06 +0000 Date: Tue, 5 May 2015 00:27:06 +0000 (UTC) From: "Kai Zheng (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8281) Erasure Coding: implement parallel stateful reading for striped layout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527629#comment-14527629 ] Kai Zheng commented on HDFS-8281: --------------------------------- Thanks for the great work and discussion here! bq. One question is why we choose 256KB as the cell size instead of the original 64KB? bq. Kai maybe you can remind us the reason? Originally 64KB was used as a HDFS side constant for the stripping cell size; 256KB was used in in ECSchema in the codec framework as erasure coding chunk size. The both were going independently. When we applied ECSchema to replace the hard-coded values, the value 256KB was used to make all the places consistent. In my view, 64KB or smaller may be better for stripping of small files; 256KB or larger may be better for erasure coding of big files. We have test records indicating with larger chunk size like 32MB native coders can outperform greatly. Though it's a little hard to choose the good default value, we support configurable schema and the chunk size is configurable as part of a schema, so we may don't need worry about that too much. How do you think of? Thanks. > Erasure Coding: implement parallel stateful reading for striped layout > ---------------------------------------------------------------------- > > Key: HDFS-8281 > URL: https://issues.apache.org/jira/browse/HDFS-8281 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Jing Zhao > Assignee: Jing Zhao > Fix For: HDFS-7285 > > Attachments: HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.002.patch, HDFS-8281.000.patch > > > This jira aims to support parallel reading for stateful read in {{DFSStripedInputStream}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)