Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 1 May 2015 18:57:07 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12825702.1430258000000.49338.1430506627958@Atlassian.JIRA>
In-Reply-To: <JIRA.12825702.1430258000000@Atlassian.JIRA>
References: <JIRA.12825702.1430258000000@Atlassian.JIRA>
 <JIRA.12825702.1430258000316@arcas>
Subject: [jira] [Commented] (HDFS-8281) Erasure Coding: implement parallel
 stateful reading for striped layout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523684#comment-14523684 ] 

Jing Zhao commented on HDFS-8281:
---------------------------------

Thanks for the comments, Zhe!

For 4&5, thanks for the explanation about the "short read"! Some of my thoughts here:
# At the current stage, I think our main use case is still sequential read, and it's good to read in parallel to serve this kind of request so that we can achieve better throughput. This means that the basic unit for each individual read should still be a cell.
# Actually the tradeoff here is the throughput and the biggest latency of serving a single read request. The parallel read may get delayed by a slow/unavailable DN. But we always have to handle slow/unavailable DN during the read. The difference is the stripe size during the decoding: let's say each time we only return 64KB (for simplicity assuming they come from the same DN), and if the data is unavailable, a corresponding (64KB * 6) stripe will be read. In the current case we read 256KB * 6 (and if the cell size is 64KB it's actually the same).
# For the possible decoding use case we need to have a buffer to keep the data that has been served. If reading a complete stripe becomes a real concern because of its latency, a simple way to improve is to read less data into the buffer each time but without changing the buffer size. But currently without detailed benchmark data I'm not sure whether we want to add this logic immediately. I think this is something we must explore while doing the performance test and we can do improvement as a follow-on work.
# One question is why we choose 256KB as the cell size instead of the original 64KB?

I will update the patch later to address comments 1~3.


> Erasure Coding: implement parallel stateful reading for striped layout
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8281
>                 URL: https://issues.apache.org/jira/browse/HDFS-8281
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.001.patch, HDFS-8281.000.patch
>
>
> This jira aims to support parallel reading for stateful read in {{DFSStripedInputStream}}.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)