hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-8328) Follow-on to update decode for DataNode striped blocks reconstruction
Date Wed, 03 Jun 2015 02:22:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570144#comment-14570144
] 

Yi Liu edited comment on HDFS-8328 at 6/3/15 2:22 AM:
------------------------------------------------------

Thanks for Zhe, Walter and Kai's comments.  My reply to you guys:

--------------------------- *To Zhe:* --------------------------
{quote}
If initial connection to a target DN (chosen by NN) is unsuccessful, getErasedIndices will
not add the index in erasedIndices; therefore the block will not be reconstructed. Should
we send an ack to NN so the block can come back to UnderReplicatedBlocks?
{quote}
This is a good question.  I think no need currently. Same as continuous block replication,
if the replication failed, DN doesn't send an ack to NN.   So, yes, the block replica is already
removed from {{neededReplications}}, next time if some client request the block replica, client
will report bad block replica to NN (and NN will check whether it's indeed a bad block replica),
so the corrupted block replica is added back to {{neededReplications}} again.  I'd like to
keep the same behavior.

{quote}
In each iteration of while (positionInBlock < firstStripedBlockLength), targetsStatus is
reset to reflect if the transfer to each target is successful. So when an element is flipped
from true to false, it will never be flipped back.
{quote}
Actually {{targetsStatus}} is *not* reset in each iteration.   {{targetsStatus}} will record
any failed target once, if any target failed, we will never transfer data to it any more.
Let me add more doc.

{quote}
zeroStripeBuffers is created based on whether the entire internal block is empty. Ideally
we should have such an array in each iteration of while (positionInBlock < firstStripedBlockLength)
so we can avoid issuing read requests as much as possible. I guess this can be a TODO for
follow-on.
{quote}
I also thought we should try to find the empty internal block first, then less read required,
but only data block can be empty, and in most cases, data block will be tried first. This
is still an improvement.  Agree to add a TODO and do it in a follow-on.

{quote}
The updated {{scheduleNewRead}} is not so easy to follow.
{quote}
We have talked about this in the offline meeting:)  Let me add more doc.

{quote}
The argument and return value of readMinimumStripedData4Recovery are not trivial to understand.
Maybe add them to the Javadoc?
{quote}
Also will add more doc about it.

{quote}
Since we are fetching mostly fixed size chunks, it's ideal to reuse buffers instead of allocating
new ones in addStripedReader. Walter Su has a good analysis under HDFS-8481 on the GC issue.
{quote}
Actually stripedReader will be allocated only once, so the buffers are reused, so there is
no GC issue.  Suppose the coder schema is 6+3,  and 2 corrupted striped blocks, suppose source
DNs number is 7, target DNs number 2.    If all the source are valid, we will only allocate
minimum(6) buffers and reuse them, and the worst situation is we allocate 7 buffers and reuse
them.   I think it's OK, of course, actually we only need to allocate 6 buffers in any situation,
this is a minor improvement,  let me do it together with above .

--------------------------- *To Walter:* --------------------------
{quote}
The first while loop should guarded with !used.get(m). m starting from stripedReaders.size()
doesn't mean it's not used.
{quote}
Same as reply to Zhe, I think I need to add more doc here :)    stripedReader will be allocated
only once, so current logic is correct.

{quote}
paddingBufferToLen(..) for multiple zeroStripeBuffers[] is wasting time. One zeroStripeBuffer
is enough.
{quote}
I ever considered this too, the reason I think Zhe replied you.

--------------------------- *To Kai:* --------------------------
{quote}
Noticed a typo: covertIndex4Decode should be convertIndex4Decode.
{quote}
Right, will update it.


was (Author: hitliuyi):
Thanks for Zhe, Walter and Kai's comments.  My reply to you guys:

--------------------------- *To Zhe:* --------------------------
{quote}
If initial connection to a target DN (chosen by NN) is unsuccessful, getErasedIndices will
not add the index in erasedIndices; therefore the block will not be reconstructed. Should
we send an ack to NN so the block can come back to UnderReplicatedBlocks?
{quote}
This is a good question.  I think no need currently. Same as continuous block replication,
if the replication failed, DN doesn't send an ack to NN.   So, yes, the block replica is already
removed from {{neededReplications}}, next time if some client request the block replica, client
will report bad block replica to NN, so the corrupted block replica is added back to {{neededReplications}}
again.  I'd like to keep the same behavior.

{quote}
In each iteration of while (positionInBlock < firstStripedBlockLength), targetsStatus is
reset to reflect if the transfer to each target is successful. So when an element is flipped
from true to false, it will never be flipped back.
{quote}
Actually {{targetsStatus}} is *not* reset in each iteration.   {{targetsStatus}} will record
any failed target once, if any target failed, we will never transfer data to it any more.
Let me add more doc.

{quote}
zeroStripeBuffers is created based on whether the entire internal block is empty. Ideally
we should have such an array in each iteration of while (positionInBlock < firstStripedBlockLength)
so we can avoid issuing read requests as much as possible. I guess this can be a TODO for
follow-on.
{quote}
I also thought we should try to find the empty internal block first, then less read required,
but only data block can be empty, and in most cases, data block will be tried first. This
is still an improvement.  Agree to add a TODO and do it in a follow-on.

{quote}
The updated {{scheduleNewRead}} is not so easy to follow.
{quote}
We have talked about this in the offline meeting:)  Let me add more doc.

{quote}
The argument and return value of readMinimumStripedData4Recovery are not trivial to understand.
Maybe add them to the Javadoc?
{quote}
Also will add more doc about it.

{quote}
Since we are fetching mostly fixed size chunks, it's ideal to reuse buffers instead of allocating
new ones in addStripedReader. Walter Su has a good analysis under HDFS-8481 on the GC issue.
{quote}
Actually stripedReader will be allocated only once, so the buffers are reused, so there is
no GC issue.  Suppose the coder schema is 6+3,  and 2 corrupted striped blocks, suppose source
DNs number is 7, target DNs number 2.    If all the source are valid, we will only allocate
minimum(6) buffers and reuse them, and the worst situation is we allocate 7 buffers and reuse
them.   I think it's OK, of course, actually we only need to allocate 6 buffers in any situation,
this is a minor improvement,  let me do it together with above .

--------------------------- *To Walter:* --------------------------
{quote}
The first while loop should guarded with !used.get(m). m starting from stripedReaders.size()
doesn't mean it's not used.
{quote}
Same as reply to Zhe, I think I need to add more doc here :)    stripedReader will be allocated
only once, so current logic is correct.

{quote}
paddingBufferToLen(..) for multiple zeroStripeBuffers[] is wasting time. One zeroStripeBuffer
is enough.
{quote}
I ever considered this too, the reason I think Zhe replied you.

--------------------------- *To Kai:* --------------------------
{quote}
Noticed a typo: covertIndex4Decode should be convertIndex4Decode.
{quote}
Right, will update it.

> Follow-on to update decode for DataNode striped blocks reconstruction
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8328
>                 URL: https://issues.apache.org/jira/browse/HDFS-8328
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-8328-HDFS-7285.001.patch, HDFS-8328-HDFS-7285.002.patch
>
>
> Current the decode for DataNode striped blocks reconstruction is a workaround, we need
to update it after the decode fix in HADOOP-11847.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message