hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (Jira)" <j...@apache.org>
Subject [jira] [Created] (HDFS-15105) Standby NN exits and fails to restart due to edit log corruption
Date Thu, 09 Jan 2020 12:41:00 GMT
Tao Yang created HDFS-15105:
-------------------------------

             Summary: Standby NN exits and fails to restart due to edit log corruption
                 Key: HDFS-15105
                 URL: https://issues.apache.org/jira/browse/HDFS-15105
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.8.0
            Reporter: Tao Yang


We found a issue that Standby NN exited and failed to restart until we resolved the edit log
corruption.
 Error logs:
{noformat}
java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block
blk_74288647857_73526148211 with blk_74288647857_73526377369 as block # 15/17 of /maindump/mainv10/dump_online/lasttable/20200105015500/part-319
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308)
{noformat}

Related edit log transactions of the same file:
{noformat}
1. TXID=444341628498  time=1578251449632
OP_UPDATE_BLOCKS
blocks: ... blk_74288647857_73526148211   blk_74454090866_73526215536

2. TXID=444342382774   time=1578251520740
OP_REASSIGN_LEASE

3. TXID=444342401216  time=1578251522779
OP_CLOSE
blocks: ... blk_74288647857_73526377369   blk_74454090866_73526374095

4. TXID=444342401394
OP_SET_GENSTAMP_V2 
generate stamp: 73526377369   !!!! this stamp is generated but already used in the previous
edit log

5. TXID=444342401395  time=1578251522835  (03:12:02,835)
OP_TRUNCATE

6. TXID=444342402176  time=1578251523246  (03:12:03,246)
OP_CLOSE
blocks: ... blk_74288647857_73526377369 
{noformat}

According to the edit logs, it's wield to see that stamp(73526377369) was generated in transaction
4 but already used in transaction 3, and for transaction 3 there should be only the last block
changed but in fact the last two blocks are both changed.

This problem might be produced in a complex scenario that truncate operation immediately followed
the recover-lease operation for the same file. A suspicious point is that between creation
and being written for transaction 3, stamp of the second last block was updated when committing
block synchronization caused by the truncate operation.
Related calling stack is as follows: 
{noformat}
NameNodeRpcServer#commitBlockSynchronization
  FSNamesystem#commitBlockSynchronization
    // update last block
    if(!copyTruncate) {
      storedBlock.setGenerationStamp(newgenerationstamp); //updated now!!!
      storedBlock.setNumBytes(newlength);
    }
{noformat}

Any comments are welcome. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Mime
View raw message