hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14311) Multi-threading conflict at layoutVersion when loading block pool storage
Date Tue, 20 Aug 2019 19:40:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911692#comment-16911692
] 

Hudson commented on HDFS-14311:
-------------------------------

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17152 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17152/])
HDFS-14311. Multi-threading conflict at layoutVersion when loading block (weichiu: rev 4cb22cd867a9295efc815dc95525b5c3e5960ea6)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> Multi-threading conflict at layoutVersion when loading block pool storage
> -------------------------------------------------------------------------
>
>                 Key: HDFS-14311
>                 URL: https://issues.apache.org/jira/browse/HDFS-14311
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.9.2
>            Reporter: Yicong Cai
>            Assignee: Yicong Cai
>            Priority: Major
>             Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
>         Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch, HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at StorageInfo.layoutVersion
in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] - Restored
36974 block files from trash before the layout upgrade. These blocks will be moved to the
previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] - Failed
to analyze storage directories for block pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the namespace state:
LV = -63 CTime = 0
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed to add storage
directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the namespace state:
LV = -63 CTime = 0
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover transition.
In BlockPoolSliceStorage.doTransition, it will read the old layoutVersion from local storage,
compare with current DataNode version, then do upgrade. In doUpgrade, add the transition
work as a sub-thread, the transition work will set the BlockPoolSliceStorage's layoutVersion
to current DN version. The next storage dir transition check will concurrent with pre storage
dir real transition work, then the BlockPoolSliceStorage instance layoutVersion will confusion.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message