hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8281) Erasure Coding: implement parallel stateful reading for striped layout
Date Fri, 01 May 2015 18:57:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523684#comment-14523684

Jing Zhao commented on HDFS-8281:

Thanks for the comments, Zhe!

For 4&5, thanks for the explanation about the "short read"! Some of my thoughts here:
# At the current stage, I think our main use case is still sequential read, and it's good
to read in parallel to serve this kind of request so that we can achieve better throughput.
This means that the basic unit for each individual read should still be a cell.
# Actually the tradeoff here is the throughput and the biggest latency of serving a single
read request. The parallel read may get delayed by a slow/unavailable DN. But we always have
to handle slow/unavailable DN during the read. The difference is the stripe size during the
decoding: let's say each time we only return 64KB (for simplicity assuming they come from
the same DN), and if the data is unavailable, a corresponding (64KB * 6) stripe will be read.
In the current case we read 256KB * 6 (and if the cell size is 64KB it's actually the same).
# For the possible decoding use case we need to have a buffer to keep the data that has been
served. If reading a complete stripe becomes a real concern because of its latency, a simple
way to improve is to read less data into the buffer each time but without changing the buffer
size. But currently without detailed benchmark data I'm not sure whether we want to add this
logic immediately. I think this is something we must explore while doing the performance test
and we can do improvement as a follow-on work.
# One question is why we choose 256KB as the cell size instead of the original 64KB?

I will update the patch later to address comments 1~3.

> Erasure Coding: implement parallel stateful reading for striped layout
> ----------------------------------------------------------------------
>                 Key: HDFS-8281
>                 URL: https://issues.apache.org/jira/browse/HDFS-8281
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-8281-HDFS-7285.001.patch, HDFS-8281-HDFS-7285.001.patch, HDFS-8281.000.patch
> This jira aims to support parallel reading for stateful read in {{DFSStripedInputStream}}.

This message was sent by Atlassian JIRA

View raw message