hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12047) Indicate preference not to affect input buffers during coding in erasure coder
Date Wed, 28 Oct 2015 16:58:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978747#comment-14978747
] 

Walter Su commented on HADOOP-12047:
------------------------------------

Oh, you did expose it by {{setCoderOption()}}. Could you set it in {{DFSStripedOutputStream}}
and {{DFSStripedInputStream}} (Maybe in another jira)? And set it by constructor instead of
{{setCoderOption()}}. And make it unmodifiable. A default {{CoderOption}} in {{AbstractRawErasureCoder}}
is useless. Because, User can change the default coder, but the implementation of {{DFSStripedInputStream}}
is the same. It must always disallow the coder change the inputs explicitly no matter which
coder is used, and no matter this coder in fact will change inputs or not.

> Indicate preference not to affect input buffers during coding in erasure coder
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-12047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12047
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: HDFS-7285
>
>         Attachments: HADOOP-12047-HDFS-7285-v1.patch, HADOOP-12047-v2.patch, initial-poc.patch
>
>
> It's good to define and ensure input buffers are not affected during coding process in
raw erasure coders. Below are copied from discussion with [~jingzhao] in HDFS-8481:
> bq. In that case we cannot reuse the source buffers I guess? Then do we need to expose
this information in the decoder?
> bq. Good catch Jing! Yes in this case we can't reuse the source buffers here as they
need to be passed to caller/applications without being changed. I'm planning to re-implement
the Java coders in HADOOP-12041 and related, when done it's possible to ensure the input buffers
not to be affected. Benefits of doing this in coder layer: 1) a more clear contract between
coder and caller in more general sense for the inputs; 2) concrete coder may have specific
tweak to optimize in the aspect, ideally no input data copying at all, worst, make the copy,
but all transparent to callers; 3) allow new coders (LRC, HH) to be layered on other primitive
coders (RS, XOR) more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message