Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 5 May 2015 00:27:06 +0000 (UTC)
From: "Kai Zheng (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12825702.1430258000000.76759.1430785626877@Atlassian.JIRA>
In-Reply-To: <JIRA.12825702.1430258000000@Atlassian.JIRA>
References: <JIRA.12825702.1430258000000@Atlassian.JIRA>
 <JIRA.12825702.1430258000316@arcas>
Subject: [jira] [Commented] (HDFS-8281) Erasure Coding: implement parallel
 stateful reading for striped layout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527629#comment-14527629 ] 

Kai Zheng commented on HDFS-8281:
---------------------------------

Thanks for the great work and discussion here!
bq. One question is why we choose 256KB as the cell size instead of the original 64KB?
bq. Kai maybe you can remind us the reason?
Originally 64KB was used as a HDFS side constant for the stripping cell size; 256KB was used in in ECSchema in the codec framework as erasure coding chunk size. The both were going independently. When we applied ECSchema to replace the hard-coded values, the value 256KB was used to make all the places consistent. In my view, 64KB or smaller may be better for stripping of small files; 256KB or larger may be better for erasure coding of big files. We have test records indicating with larger chunk size like 32MB native coders can outperform greatly. Though it's a little hard to choose the good default value, we support configurable schema and the chunk size is configurable as part of a schema, so we may don't need worry about that too much. How do you think of? Thanks.

> Erasure Coding: implement parallel stateful reading for striped layout
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8281
>                 URL: https://issues.apache.org/jira/browse/HDFS-8281
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>             Fix For: HDFS-7285
>
>         Attachments: HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.002.patch, HDFS-8281.000.patch
>
>
> This jira aims to support parallel reading for stateful read in {{DFSStripedInputStream}}.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)