hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-14800) Data race between block report and recoverLease()
Date Fri, 30 Aug 2019 00:30:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei-Chiu Chuang updated HDFS-14800:
-----------------------------------
    Attachment: PreCommit-HDFS-Build #27717 test - testUpgradeFromRel1BBWImage [Jenkins].pdf

> Data race between block report and recoverLease()
> -------------------------------------------------
>
>                 Key: HDFS-14800
>                 URL: https://issues.apache.org/jira/browse/HDFS-14800
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: PreCommit-HDFS-Build #27717 test - testUpgradeFromRel1BBWImage [Jenkins].pdf
>
>
> I thought I fixed it in HDFS-10240, but I am seeing a similar race condition in a precommit
test again.
> https://builds.apache.org/job/PreCommit-HDFS-Build/27717/testReport/org.apache.hadoop.hdfs/TestDFSUpgradeFromImage/testUpgradeFromRel1BBWImage/
> {noformat}
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  namenode.FSNamesystem
(FSNamesystem.java:recoverLeaseInternal(2682)) - recoverLease: [Lease.  Holder: DFSClient_8256078,
pending creates: 13], src=/1kb-multiple-checksum-blocks-64-16 from client DFSClient_8256078
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  namenode.FSNamesystem
(FSNamesystem.java:internalReleaseLease(3424)) - Recovering [Lease.  Holder: DFSClient_8256078,
pending creates: 13], src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:31,789 [IPC Server handler 9 on default port 39637] WARN  BlockStateChange
(BlockUnderConstructionFeature.java:initializeBlockRecovery(238)) - BLOCK* BlockUnderConstructionFeature.initializeBlockRecovery:
No blocks found, lease removed.
> 2019-08-29 13:34:31,790 [IPC Server handler 9 on default port 39637] WARN  hdfs.StateChange
(FSNamesystem.java:internalReleaseLease(3550)) - DIR* NameSystem.internalReleaseLease: File
/1kb-multiple-checksum-blocks-64-16 has not been closed. Lease recovery is in progress. RecoveryId
= 1031 for block blk_7162739548153522810_1020
> 2019-08-29 13:34:31,791 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2645))
- BLOCK* processReport 0x8f0bcadff51597e8: Processing first storage report for DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351
from datanode 187b2e09-75e8-4bd0-ab46-90e32839a21d
> 2019-08-29 13:34:31,793 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2674))
- BLOCK* processReport 0x8f0bcadff51597e8: from storage DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351
node DatanodeRegistration(127.0.0.1:32987, datanodeUuid=187b2e09-75e8-4bd0-ab46-90e32839a21d,
infoPort=42569, infoSecurePort=0, ipcPort=39147, storageInfo=lv=-57;cid=testClusterId;nsid=889473900;c=1567085670457),
blocks: 13, hasStaleStorage: false, processing time: 2 msecs, invalidatedBlocks: 0
> 2019-08-29 13:34:31,808 [BP-268013932-172.17.0.2-1567085670423 heartbeating to localhost/127.0.0.1:39637]
INFO  datanode.DataNode (BPServiceActor.java:blockReport(432)) - Successfully sent block report
0x8f0bcadff51597e8,  containing 1 storage report(s), of which we sent 1. The reports had 13
total blocks and used 1 RPC(s). This took 4 msecs to generate and 33 msecs for RPC and NN
processing. Got back no commands.
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  namenode.FSNamesystem
(FSNamesystem.java:recoverLeaseInternal(2682)) - recoverLease: [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1,
pending creates: 1], src=/1kb-multiple-checksum-blocks-64-16 from client DFSClient_NONMAPREDUCE_1352689944_1
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  namenode.FSNamesystem
(FSNamesystem.java:internalReleaseLease(3424)) - Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1,
pending creates: 1], src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:32,796 [IPC Server handler 8 on default port 39637] INFO  blockmanagement.BlockManager
(PendingRecoveryBlocks.java:add(80)) - Block recovery attempt for blk_7162739548153522810_1020
rejected, as the previous attempt times out in 88 seconds.
> {noformat}
> Looks like if client calls recoverLease on a file before NameNod receives a block report
from DataNodes, it will fail to recover the lease, and the file remains unclosed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message