hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding
Date Wed, 22 Apr 2015 21:39:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507966#comment-14507966
] 

Kai Zheng commented on HADOOP-11847:
------------------------------------

bq.We try to decode all null slots in the input arrays. I'm not sure if this will cause unnecessary
computation.
Yes this is something I want to avoid. In theory it's possible to only recover and compute
the target really erased block(s), at least for RS code. Currently having to decode and compute
all unintended blocks is due to the limitation originated from HDFS-RAID. Looks like resolving
the limitation sound a non-trivial task in short for me, would you agree we have a follow-on
issue for it? I checked ISA-L does exactly what we want and doesn't have such limitation.


Regarding the RS->XOR optimization trick, it doesn't sound a good one now, as we would
read only 6 good blocks to recover the erased one using RS decoder, instead of having to read
8 blocks to recover the erased one using XOR decoder. I will remove the codes. As in theory
how RS->XOR, you might google and read this paper if you're interested, "Flexible Parameterization
of XOR based Codes for Distributed Storage".

For other points, I will double check my codes later, adding more comments to explain or clarify
them better.

> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-11847-v1.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required inputs while
decoding. It will also refine and document the relevant APIs for better understanding and
usage. When using least required inputs, it may add computating overhead but will possiblly
outperform overall since less network traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question raised in
HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 is missing,
and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should I construct the inputs to
RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message