hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-8319) Erasure Coding: support decoding for stateful read
Date Thu, 04 Jun 2015 02:15:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571976#comment-14571976
] 

Yi Liu edited comment on HDFS-8319 at 6/4/15 2:15 AM:
------------------------------------------------------

{quote}Yes, we can avoid this by always allocating direct buffer for parity blocks as well.
But unlike the buffer used by data blocks, this (64KB * 3) buffer may never be used if decoding
is unnecessary.{quote}
I'm not sure about this. The buffer allocation behavior can happen when it's decided to recover
an erasure. We don't need to allocate them initially. Looks like it's not related to the buffer
type? I may be not right because not reading the whole codes yet.
bq. This means all the input buffers' array() return the same (64KB * 6) byte array, while
their position are totally independent and can be all 0.
I see. Thanks a lot for the detailed explanation. We could also not use {{slice}} call, instead
use {{duplicate}} with position and limit being properly set. I understand that this may not
seem the flexible, but we need some tradeoff. Another way to work around this case is to use
additional parameters like {{offsets}} and {{lengths}}, but doing so is a little complex though
flexible.
Could we say spec the constraint in Javadoc?
bq. One question is whether the mixed scenario breaks the functionality?
It doesn't break the codes in the branch, as the patches for native coders (HADOOP-11540 and
etc.) are not in in yet. In your changes it will do break native coders, because once it's
determined to proceed usingDirectBuffer(true), all the inputs/output buffers will be regarded
as direct buffers and passed to JNI native codes. If any of them isn't direct buffer in fact,
it will coredump. In current Java coders it works because in the underlying it will proceed
as regarding all input/output buffers are ByteBuffers (regardless of on-heap or direct) so
it suffers performance loss, as no-converting-to-array happens.
For the long term consideration, I'm not very sure we will support mixing buffers, because
doing so we have to convert all the buffers to be uniform either on-heap or direct before
calling into the underlying implementation, where data bytes are uniformly retrieved, computed
and stored in matrix and vector operation. The conversion will need to copy data, though not
complex to do. 



was (Author: drankye):
Yes, we can avoid this by always allocating direct buffer for parity blocks as well. But unlike
the buffer used by data blocks, this (64KB * 3) buffer may never be used if decoding is unnecessary.
I'm not sure about this. The buffer allocation behavior can happen when it's decided to recover
an erasure. We don't need to allocate them initially. Looks like it's not related to the buffer
type? I may be not right because not reading the whole codes yet.
bq. This means all the input buffers' array() return the same (64KB * 6) byte array, while
their position are totally independent and can be all 0.
I see. Thanks a lot for the detailed explanation. We could also not use {{slice}} call, instead
use {{duplicate}} with position and limit being properly set. I understand that this may not
seem the flexible, but we need some tradeoff. Another way to work around this case is to use
additional parameters like {{offsets}} and {{lengths}}, but doing so is a little complex though
flexible.
Could we say spec the constraint in Javadoc?
bq. One question is whether the mixed scenario breaks the functionality?
It doesn't break the codes in the branch, as the patches for native coders (HADOOP-11540 and
etc.) are not in in yet. In your changes it will do break native coders, because once it's
determined to proceed usingDirectBuffer(true), all the inputs/output buffers will be regarded
as direct buffers and passed to JNI native codes. If any of them isn't direct buffer in fact,
it will coredump. In current Java coders it works because in the underlying it will proceed
as regarding all input/output buffers are ByteBuffers (regardless of on-heap or direct) so
it suffers performance loss, as no-converting-to-array happens.
For the long term consideration, I'm not very sure we will support mixing buffers, because
doing so we have to convert all the buffers to be uniform either on-heap or direct before
calling into the underlying implementation, where data bytes are uniformly retrieved, computed
and stored in matrix and vector operation. The conversion will need to copy data, though not
complex to do. 


> Erasure Coding: support decoding for stateful read
> --------------------------------------------------
>
>                 Key: HDFS-8319
>                 URL: https://issues.apache.org/jira/browse/HDFS-8319
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-8319.001.patch, HDFS-8319.002.patch, HDFS-8319.003.patch
>
>
> HDFS-7678 adds the decoding functionality for pread. This jira plans to add decoding
to stateful read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message