hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
Date Fri, 29 May 2015 01:44:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564053#comment-14564053
] 

Walter Su commented on HDFS-8481:
---------------------------------

This is user's logic of calling pread. The {{buf}} is reused until the entire file has been
read.
{code}
byte[] buf = new buf[4096];
while(readLen = in.read(buf)){
..
}
{code}

Assume we has a 768mb file (128mb * 6) which exactly contains 1 block group. We lost one block
so we have to decode until 768mb data has been read.
{code}
    byte[][] decodeInputs =
        new byte[dataBlkNum + parityBlkNum][(int) alignedStripe.getSpanInBlock()];
{code}
For every {{alignedStripe}} being read we need a new {{decodeInputs}}. For everytime user
calls pread, we have new multiple {{alignedStripe}}. For everytime user calls stateful read,
we have 1~3 new {{alignedStripe}}.
Which means, when entire 768mb data has been read, we have newed 128mb*9 byte[][] {{decodeInputs}}
garbage waiting gc.
We cannot depend {{DFSStripedInputStream}} to keep {{decodeInputs}} object and reuse it. Because
every {{SpanInBlock}} is different.
I'm not sure if I make it clear. If so, it's an issue right? (Not related to this jira)
bq. we need more abstraction than the util.
I'm +1 for this idea. I think we can resolve the {{decodeInputs}} issue in that abstraction.

> Erasure coding: remove workarounds in client side stripped blocks recovering
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-8481
>                 URL: https://issues.apache.org/jira/browse/HDFS-8481
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch
>
>
> After HADOOP-11847 and related fixes, we should be able to properly calculate decoded
contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message