hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wei.he (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8258) namenode shutdown strangely
Date Sun, 26 Apr 2015 17:06:38 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

wei.he updated HDFS-8258:
-------------------------
    Description: 
Hi
    I use the hadoop2.5.

   The standby namenode shutdown after restart one of DNs . And I got the error with tail
last logs.

.......................................
2015-04-23 15:37:39,133 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.223:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,137 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(873))
- BLOCK* registerDatanode: from DatanodeRegistration(192.168.146.130, datanodeUuid=ee834f1d-ba78-48de-bd7a-c364b67b535f,
infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-64dc0d1c-f525-432b-9b28-2b92262d6111;nsid=740344496;c=0)
storage ee834f1d-ba78-48de-bd7a-c364b67b535f
2015-04-23 15:37:39,138 INFO  namenode.NameNode (DatanodeManager.java:registerDatanode(881))
- BLOCK* registerDatanode: 192.168.146.130:50010
2015-04-23 15:37:39,261 INFO  net.NetworkTopology (NetworkTopology.java:remove(482)) - Removing
a node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
2015-04-23 15:37:39,262 INFO  net.NetworkTopology (NetworkTopology.java:add(413)) - Adding
a new node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
2015-04-23 15:37:39,264 WARN  namenode.FSNamesystem (FSNamesystem.java:getCorruptFiles(6775))
- Get corrupt file blocks returned error: Operation category READ is not supported in state
standby
2015-04-23 15:37:39,264 FATAL blockmanagement.BlockManager (BlockManager.java:run(3390)) -
ReplicationMonitor thread received Runtime exception.
java.lang.NullPointerException
        at java.util.TreeMap.getEntry(TreeMap.java:342)
        at java.util.TreeMap.get(TreeMap.java:273)
        at org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:137)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:3231)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1191)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3431)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3375)
        at java.lang.Thread.run(Thread.java:744)
2015-04-23 15:37:39,267 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-f0b04209-2f6a-491b-9f28-173c4c53d364 for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d255d5a4-4543-4621-b258-4c575843f29c for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-4a88b36b-7ae6-4f30-b95c-0c4e47d70878 for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d4166bd7-a8c0-4067-8c68-78c6c31dcd9e for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-468eeca6-e45c-428f-811a-71c5d1f04a9f for DN 192.168.146.130:50010
2015-04-23 15:37:39,269 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1253664895_179969194
size 285087186
2015-04-23 15:37:39,271 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.210:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,274 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1083102318_9365917
size 134217728
2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,299 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-23 15:37:39,301 INFO  namenode.NameNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop14667/192.168.146.67
************************************************************/
..........................................................................


    
      I tracked the log and looked at the InvalidateBlocks.invalidateWork method. 

...
synchronized List<Block> invalidateWork(final DatanodeDescriptor dn) {
    final long delay = getInvalidationDelay();
    if (delay > 0) {
      if (BlockManager.LOG.isDebugEnabled()) {
        BlockManager.LOG
            .debug("Block deletion is delayed during NameNode startup. "
                       + "The deletion will start after " + delay + " ms.");
      }
      return null;
    }
    final LightWeightHashSet<Block> set = node2blocks.get(dn);
    if (set == null) {
      return null;
    }

    // # blocks that can be sent in one message is limited
    final int limit = blockInvalidateLimit;
    final List<Block> toInvalidate = set.pollN(limit);

    // If we send everything in this message, remove this node entry
    if (set.isEmpty()) {
      remove(dn);
    }

    dn.addBlocksToBeInvalidated(toInvalidate);
    numBlocks -= toInvalidate.size();
    return toInvalidate;
  }
...
   i noticed this line .  

 final LightWeightHashSet<Block> set = node2blocks.get(dn);

   does not check the null at the variable of dn first. so when dn is null , occured the NullPointerException.

   ReplicationMonitor thread monitors the replication state periodically . Unfortunatelly
i stop a datanode with the concurrence of ReplicationMonitor run.  So treemap.get(null) throw
NullPointerException.

   I am not sure whether is a bug.  

   If yes,  can i fixed this issue,  i check the dn null value before treemap.get(dn)?


   
   I hope that is helpful.


  was:
Hi
    I use the hadoop2.5.

   The standby namenode shutdown after restart one of DNs . And I got the error with tail
last logs.

...
2015-04-23 15:37:39,133 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.223:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,137 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(873))
- BLOCK* registerDatanode: from DatanodeRegistration(192.168.146.130, datanodeUuid=ee834f1d-ba78-48de-bd7a-c364b67b535f,
infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-64dc0d1c-f525-432b-9b28-2b92262d6111;nsid=740344496;c=0)
storage ee834f1d-ba78-48de-bd7a-c364b67b535f
2015-04-23 15:37:39,138 INFO  namenode.NameNode (DatanodeManager.java:registerDatanode(881))
- BLOCK* registerDatanode: 192.168.146.130:50010
2015-04-23 15:37:39,261 INFO  net.NetworkTopology (NetworkTopology.java:remove(482)) - Removing
a node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
2015-04-23 15:37:39,262 INFO  net.NetworkTopology (NetworkTopology.java:add(413)) - Adding
a new node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
2015-04-23 15:37:39,264 WARN  namenode.FSNamesystem (FSNamesystem.java:getCorruptFiles(6775))
- Get corrupt file blocks returned error: Operation category READ is not supported in state
standby
2015-04-23 15:37:39,264 FATAL blockmanagement.BlockManager (BlockManager.java:run(3390)) -
ReplicationMonitor thread received Runtime exception.
java.lang.NullPointerException
        at java.util.TreeMap.getEntry(TreeMap.java:342)
        at java.util.TreeMap.get(TreeMap.java:273)
        at org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:137)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:3231)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1191)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3431)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3375)
        at java.lang.Thread.run(Thread.java:744)
2015-04-23 15:37:39,267 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-f0b04209-2f6a-491b-9f28-173c4c53d364 for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d255d5a4-4543-4621-b258-4c575843f29c for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-4a88b36b-7ae6-4f30-b95c-0c4e47d70878 for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d4166bd7-a8c0-4067-8c68-78c6c31dcd9e for DN 192.168.146.130:50010
2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-468eeca6-e45c-428f-811a-71c5d1f04a9f for DN 192.168.146.130:50010
2015-04-23 15:37:39,269 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1253664895_179969194
size 285087186
2015-04-23 15:37:39,271 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.210:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,274 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1083102318_9365917
size 134217728
2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
2015-04-23 15:37:39,299 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-23 15:37:39,301 INFO  namenode.NameNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop14667/192.168.146.67
************************************************************/
...
    
      I tracked the log and looked at the InvalidateBlocks.invalidateWork method. 

...
synchronized List<Block> invalidateWork(final DatanodeDescriptor dn) {
    final long delay = getInvalidationDelay();
    if (delay > 0) {
      if (BlockManager.LOG.isDebugEnabled()) {
        BlockManager.LOG
            .debug("Block deletion is delayed during NameNode startup. "
                       + "The deletion will start after " + delay + " ms.");
      }
      return null;
    }
    final LightWeightHashSet<Block> set = node2blocks.get(dn);
    if (set == null) {
      return null;
    }

    // # blocks that can be sent in one message is limited
    final int limit = blockInvalidateLimit;
    final List<Block> toInvalidate = set.pollN(limit);

    // If we send everything in this message, remove this node entry
    if (set.isEmpty()) {
      remove(dn);
    }

    dn.addBlocksToBeInvalidated(toInvalidate);
    numBlocks -= toInvalidate.size();
    return toInvalidate;
  }
...
   i noticed this line .  

 final LightWeightHashSet<Block> set = node2blocks.get(dn);

   does not check the null at the variable of dn first. so when dn is null , occured the NullPointerException.

   ReplicationMonitor thread monitors the replication state periodically . Unfortunatelly
i stop a datanode with the concurrence of ReplicationMonitor run.  So treemap.get(null) throw
NullPointerException.

   I am not sure whether is a bug.  

   If yes,  can i fixed this issue,  i check the dn null value before treemap.get(dn)?


   
   I hope that is helpful.



> namenode  shutdown strangely
> ----------------------------
>
>                 Key: HDFS-8258
>                 URL: https://issues.apache.org/jira/browse/HDFS-8258
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: wei.he
>
> Hi
>     I use the hadoop2.5.
>    The standby namenode shutdown after restart one of DNs . And I got the error with
tail last logs.
> .......................................
> 2015-04-23 15:37:39,133 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.223:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
> 2015-04-23 15:37:39,137 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(873))
- BLOCK* registerDatanode: from DatanodeRegistration(192.168.146.130, datanodeUuid=ee834f1d-ba78-48de-bd7a-c364b67b535f,
infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-64dc0d1c-f525-432b-9b28-2b92262d6111;nsid=740344496;c=0)
storage ee834f1d-ba78-48de-bd7a-c364b67b535f
> 2015-04-23 15:37:39,138 INFO  namenode.NameNode (DatanodeManager.java:registerDatanode(881))
- BLOCK* registerDatanode: 192.168.146.130:50010
> 2015-04-23 15:37:39,261 INFO  net.NetworkTopology (NetworkTopology.java:remove(482))
- Removing a node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
> 2015-04-23 15:37:39,262 INFO  net.NetworkTopology (NetworkTopology.java:add(413)) - Adding
a new node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010
> 2015-04-23 15:37:39,264 WARN  namenode.FSNamesystem (FSNamesystem.java:getCorruptFiles(6775))
- Get corrupt file blocks returned error: Operation category READ is not supported in state
standby
> 2015-04-23 15:37:39,264 FATAL blockmanagement.BlockManager (BlockManager.java:run(3390))
- ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
>         at java.util.TreeMap.getEntry(TreeMap.java:342)
>         at java.util.TreeMap.get(TreeMap.java:273)
>         at org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:137)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:3231)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1191)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3431)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3375)
>         at java.lang.Thread.run(Thread.java:744)
> 2015-04-23 15:37:39,267 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-f0b04209-2f6a-491b-9f28-173c4c53d364 for DN 192.168.146.130:50010
> 2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d255d5a4-4543-4621-b258-4c575843f29c for DN 192.168.146.130:50010
> 2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-4a88b36b-7ae6-4f30-b95c-0c4e47d70878 for DN 192.168.146.130:50010
> 2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-d4166bd7-a8c0-4067-8c68-78c6c31dcd9e for DN 192.168.146.130:50010
> 2015-04-23 15:37:39,268 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667))
- Adding new storage ID DS-468eeca6-e45c-428f-811a-71c5d1f04a9f for DN 192.168.146.130:50010
> 2015-04-23 15:37:39,269 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1253664895_179969194
size 285087186
> 2015-04-23 15:37:39,271 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.210:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
> 2015-04-23 15:37:39,274 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size
0
> 2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1083102318_9365917
size 134217728
> 2015-04-23 15:37:39,277 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2343))
- BLOCK* addStoredBlock: blockMap updated: 192.168.146.191:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size
0
> 2015-04-23 15:37:39,299 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting
with status 1
> 2015-04-23 15:37:39,301 INFO  namenode.NameNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at hadoop14667/192.168.146.67
> ************************************************************/
> ..........................................................................
>     
>       I tracked the log and looked at the InvalidateBlocks.invalidateWork method. 
> ...
> synchronized List<Block> invalidateWork(final DatanodeDescriptor dn) {
>     final long delay = getInvalidationDelay();
>     if (delay > 0) {
>       if (BlockManager.LOG.isDebugEnabled()) {
>         BlockManager.LOG
>             .debug("Block deletion is delayed during NameNode startup. "
>                        + "The deletion will start after " + delay + " ms.");
>       }
>       return null;
>     }
>     final LightWeightHashSet<Block> set = node2blocks.get(dn);
>     if (set == null) {
>       return null;
>     }
>     // # blocks that can be sent in one message is limited
>     final int limit = blockInvalidateLimit;
>     final List<Block> toInvalidate = set.pollN(limit);
>     // If we send everything in this message, remove this node entry
>     if (set.isEmpty()) {
>       remove(dn);
>     }
>     dn.addBlocksToBeInvalidated(toInvalidate);
>     numBlocks -= toInvalidate.size();
>     return toInvalidate;
>   }
> ...
>    i noticed this line .  
>  final LightWeightHashSet<Block> set = node2blocks.get(dn);
>    does not check the null at the variable of dn first. so when dn is null , occured
the NullPointerException.
>    ReplicationMonitor thread monitors the replication state periodically . Unfortunatelly
i stop a datanode with the concurrence of ReplicationMonitor run.  So treemap.get(null) throw
NullPointerException.
>    I am not sure whether is a bug.  
>    If yes,  can i fixed this issue,  i check the dn null value before treemap.get(dn)?
>    
>    I hope that is helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message