hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
Date Thu, 12 Nov 2015 07:36:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001798#comment-15001798
] 

Uma Maheswara Rao G commented on HDFS-9381:
-------------------------------------------

{quote}
These internal blocks can be reported at different time. When one internal block is reported,
we still need to wait others to be reported instead of forcing timeout.
{quote}
After pending replication timeout, it will not keep any reference to that block. one block
entry will be added back to neededReplications. While processing neededReplications block,
it will find all the current blockGroup details. So, it should cover even recent block missed
stuff right. 

{quote}
But I think they are rare? We have a block of this kind when we short of DNs, we can't choose
enough DNs to schedule recovery at once, so we shedule twice.
{quote}
Point here is one block is enough to happen this case. When neededReplications just sent for
replication and another block comes immediately, then if you just skip with out removing from
neededReplications, the same block will be picked again and again and return back. Since NN
fsnLock is very important for other operations in general. My point was why this block processing
should do something under lock for no purpose? Yeah overall case can be rare in good condition
less failures cluster :-)

> When same block came for replication for Striped mode, we can move that block to PendingReplications
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9381
>                 URL: https://issues.apache.org/jira/browse/HDFS-9381
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding, namenode
>    Affects Versions: 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-9381.00.patch
>
>
> Currently I noticed that we are just returning null if block already exists in pendingReplications
in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>       if (pendingNum > 0) {
>         // Wait the previous recovery to finish.
>         return null;
>       }
> {code}
>  Here if we just return null and if neededReplications contains only fewer blocks(basically
by default if less than numliveNodes*2), then same blocks can be picked again from neededReplications
from next loop as we are not removing element from neededReplications. Since this replication
process need to take fsnamesystmem lock and do, we may spend some time unnecessarily in every
loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications for this
block and remove from neededReplications? and also another point to consider here is, to add
into pendingReplications, generally we need target and it is nothing but to which node we
issued replication command. Later when after replication success and DN reported it, block
will be removed from pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not have selected
target yet. So which target to be passed to pendingReplications if we add this block? One
Option I am thinking is, how about just passing srcNode itself as target for this special
condition? So, anyway if the block is really missed, srcNode will not report it. So this block
will not be removed from pending replications, so that when it is timed out, it will be considered
for replication again and that time it will find actual target to replicate while processing
as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message