hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
Date Thu, 31 Mar 2016 12:24:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219800#comment-15219800
] 

Hadoop QA commented on HDFS-10240:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 23s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} |
{color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s {color} | {color:green}
trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s {color} |
{color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s {color} |
{color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s {color} | {color:green}
the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} |
{color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s {color} |
{color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 41s {color} |
{color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 100m 25s {color} | {color:red}
hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 20s {color} | {color:red}
hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s {color}
| {color:green} Patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 225m 16s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12796262/HDFS-10240-001.patch
|
| JIRA Issue | HDFS-10240 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 3009665c44df 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0064cba |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
|
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15023/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15023/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HDFS-Build/15023/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt
https://builds.apache.org/job/PreCommit-HDFS-Build/15023/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
|
| JDK v1.7.0_95  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15023/testReport/
|
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15023/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Race between close/recoverLease leads to missing block
> ------------------------------------------------------
>
>                 Key: HDFS-10240
>                 URL: https://issues.apache.org/jira/browse/HDFS-10240
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: zhouyingchao
>            Assignee: zhouyingchao
>         Attachments: HDFS-10240-001.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:
XXXXXXXXXX. BP-219149063-10.108.84.25-1446859315800 blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* blk_1226490256_153006345{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
recovery started, primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease:
File XXXXXXXXXX has not been closed. Lease recovery is in progress. RecoveryId = 153006357
for block blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.114.5.53:11402 is added to blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2,
replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.114.5.44:11402 is added to blk_1226490256_153006345 size 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.114.6.14:11402 is added to blk_1226490256_153006345 size 139
> 2016-03-28,10:00:08,808 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 10.114.5.53:11402,
10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap:
blk_1226490256 added as corrupt on 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE
and reported genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap:
blk_1226490256 added as corrupt on 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE
and reported genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap:
blk_1226490256 added as corrupt on 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE
and reported genstamp 153006357 does not match genstamp in block map 153006345
> From the log, I guess this is what has happened in order:
> 1  Process A open a file F for write.
> 2  Somebody else called recoverLease against F.
> 3  A closed F.
> The root cause of the missing block is that recoverLease increased gen count of blocks
on datanode whereas the gen count on Namenode is reset by close in step 3. I think we should
check if the last block is under recovery when trying to close a file. If so we should just
throw an exception to client quickly.
> Although the issue is found on a 2.4 HDFS, it looks like the issue also exist on the
trunk from code. I'll post a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message