hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11486) Client close() should not fail fast if the last block is being decommissioned
Date Tue, 28 Mar 2017 04:11:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944502#comment-15944502
] 

Hadoop QA commented on HDFS-11486:
----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 25s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 11s{color}
| {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 49s{color} |
{color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 45s{color} |
{color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 20s{color}
| {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 55s{color} |
{color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 16s{color}
| {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  1s{color} |
{color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 56s{color} |
{color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 33s{color} |
{color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 43s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 38s{color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 38s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 39s{color} |
{color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 39s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 18s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 48s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 12s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  5s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 51s{color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 30s{color} |
{color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 46s{color} | {color:green}
hadoop-hdfs in the patch passed with JDK v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 20s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 12s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_121 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
|
|   | hadoop.hdfs.server.namenode.TestCheckpoint |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HDFS-11486 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12860783/HDFS-11486-branch-2.8.003.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 1f2c980adf74 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.8 / b218676 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
|
| findbugs | v3.0.0 |
| JDK v1.7.0_121  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18862/testReport/
|
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18862/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Client close() should not fail fast if the last block is being decommissioned
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-11486
>                 URL: https://issues.apache.org/jira/browse/HDFS-11486
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDF-11486.test.patch, HDFS-11486.001.patch, HDFS-11486.002.patch,
HDFS-11486.003.patch, HDFS-11486-branch-2.8.003.patch, HDFS-11486.test-inmaintenance.patch
>
>
> If a DFS client closes a file while the last block is being decommissioned, the close()
may fail if the decommission of the block does not complete in a few seconds.
> When a DataNode is being decommissioned, NameNode marks the DN's state as DECOMMISSION_INPROGRESS_INPROGRESS,
and blocks with replicas on these DataNodes become under-replicated immediately. A close()
call which attempts to complete the last open block will fail if the number of live replicas
is below minimal replicated factor, due to too many replicas residing on the DataNodes.
> The client internally will try to complete the last open block for up to 5 times by default,
which is roughly 12 seconds. After that, close() throws an exception like the following, which
is typically not handled properly.
> {noformat}
> java.io.IOException: Unable to close file because the last blockBP-33575088-10.0.0.200-1488410554081:blk_1073741827_1003
does not have enough number of replicas.
> 	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:864)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:827)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:793)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> 	at org.apache.hadoop.hdfs.TestDecommission.testCloseWhileDecommission(TestDecommission.java:708)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> Once the exception is thrown, the client usually does not attempt to close again, so
the file remains in open state, and the last block remains in under replicated state.
> Subsequently, administrator runs recoverLease tool to salvage the file, but the attempt
failed because the block remains in under replicated state. It is not clear why the block
is never replicated though. However, administrators think it becomes a corrupt file because
the file remains open via fsck -openforwrite and the file modification time is hours ago.
> In summary, I do not think close() should fail because the last block is being decommissioned.
The block has sufficient number replicas, and it's just that some replicas are being decommissioned.
Decomm should be transparent to clients.
> This issue seems to be more prominent on a very large scale cluster, with min replication
factor set to 2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message