hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
Date Thu, 19 Sep 2013 02:39:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771538#comment-13771538
] 

Tsuyoshi OZAWA commented on HDFS-5225:
--------------------------------------

Colin and Roman,

thank you for sharing! I understood the problem correctly.

Junping, 

Both blockMap and blockInfoSet must be synchronized by BlockPoolSliceScanner's instance in
this case. verifyFirstBlock method refers blockInfoSet, but it's not synchronized. IMHO, if
we can make verifyFirstBlock method synchronized, this problem may be solved. What do you
think?
                
> datanode keeps logging the same 'is no longer in the dataset' message over and over again
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-5225
>                 URL: https://issues.apache.org/jira/browse/HDFS-5225
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.1.1-beta
>            Reporter: Roman Shaposhnik
>
> I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration:
4 nodes fully distributed cluster with security on.
> All of a sudden my DN ate up all of the space in /var/log logging the following message
repeatedly:
> {noformat}
> 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset
> {noformat}
> It wouldn't answer to a jstack and jstack -F ended up being useless.
> Here's what I was able to find in the NameNode logs regarding this block ID:
> {noformat}
> fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
> 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:
/user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW],
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW],
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW],
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW],
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004,
10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
> 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004,
10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
> 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371
10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
> 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks:
ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363,
blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks:
ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355,
blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742185_1364, blk_1073742187_1367,
blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks:
ask 10.83.107.80:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355,
blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742183_1362, blk_1073742184_1363,
blk_1073742185_1364, blk_1073742186_1366, blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
> {noformat}
> This seems to suggest that the block was successfully deleted, but then DN got into a
death spiral inside of a scanner.
> I can keep the cluster running for a few days if anybody is willing to take a look. Ask
me for creds via a personal email.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message