hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brahma Reddy Battula (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
Date Wed, 09 Mar 2016 07:54:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186699#comment-15186699
] 

Brahma Reddy Battula commented on HDFS-9917:
--------------------------------------------

 *{color:blue}As current intention is not overload the NN{color}. Planning to fix like following*

 - *{color:green}Clear the IBRS on re-register to namenode.{color}* 

{code}
void reRegister() throws IOException {
    if (shouldRun()) {
      // re-retrieve namespace info to make sure that, if the NN
      // was restarted, we still match its version (HDFS-2120)
      NamespaceInfo nsInfo = retrieveNamespaceInfo();
      // and re-register
      register(nsInfo);
      scheduler.scheduleHeartbeat();
      //HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
      // for sometime.
      if (state == HAServiceState.STANDBY) {
        ibrManager.clearIBRs();
      }
    }
  }
{code}

Any thoughts on this..?

> IBR accumulate more objects when SNN was down for sometime.
> -----------------------------------------------------------
>
>                 Key: HDFS-9917
>                 URL: https://issues.apache.org/jira/browse/HDFS-9917
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>
> SNN was down for sometime because of some reasons..After restarting SNN,it became unreponsive
because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), where as each
datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by one..This issue
happened in 2.4.1 where split of blockreport was not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message