hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
Date Wed, 30 Jul 2014 16:28:38 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Ma updated HDFS-6772:
--------------------------

    Attachment: HDFS-6772.patch

Here is the patch.

1. After DN reregistration with NN, Heartbeat will come before Blockreport.
2. Add a new metrics for the number of BlockContentsStale storages. It is useful for general
cluster operation and unit tests.

> Get DNs out of blockContentsStale==true state faster when NN restarts
> ---------------------------------------------------------------------
>
>                 Key: HDFS-6772
>                 URL: https://issues.apache.org/jira/browse/HDFS-6772
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-6772.patch
>
>
> Here is the non-HA scenario.
> 1. Get HDFS into block-over-replicated situation.
> 2. Restart the NN.
> 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long
time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks
size will impact blockreport latency. Given blockreport takes NN global lock, it has severe
impact on NN performance and make the cluster unstable.
> Why will DNs remain in blockContentsStale==true state for a long time?
> 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat
RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat.
In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request;
DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after
that.
> 2. So right after the first blockreport, given heartbeatedSinceFailover remains false,
blockContentsStale will stay true.
> {noformat}
> DatanodeStorageInfo.java
>   void receivedBlockReport() {
>     if (heartbeatedSinceFailover) {
>       blockContentsStale = false;
>     }
>     blockReportCount++;
>   }
> {noformat}
> 3. So the DN will remain in blockContentsStale==true until the next blockreport. For
big cluster, dfs.blockreport.intervalMsec could be set to some large value.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message