hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10490) Client may never recovery replica after a timeout during sending packet
Date Mon, 06 Jun 2016 05:40:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316213#comment-15316213
] 

Hadoop QA commented on HDFS-10490:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 57s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green}
trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 8s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 4s {color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 9s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12808232/HDFS-10490.patch
|
| JIRA Issue | HDFS-10490 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 0bfae3682fd4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 106234d |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15658/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HDFS-Build/15658/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15658/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15658/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Client may never recovery replica after a timeout during sending packet
> -----------------------------------------------------------------------
>
>                 Key: HDFS-10490
>                 URL: https://issues.apache.org/jira/browse/HDFS-10490
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: He Tianyi
>         Attachments: HDFS-10490.patch
>
>
> For newly created replica, a meta file is created in constructor of {{BlockReceiver}}
(for {{WRITE_BLOCK}} op). Its header will be written lazily (buffered in memory first by {{BufferedOutputStream}}).

> If following packets fail to deliver (e.g. in  extreme network condition), the header
may never get flush until closed. 
> However, {{BlockReceiver}} will not call close until block receiving is finished or exception(s)
encountered. Also in extreme network condition, both RST & FIN may not deliver in time.

> In this case, if client tries to initiates a {{transferBlock}} to a new datanode (in
{{addDatanode2ExistingPipeline}}), existing datanode will see an empty meta if its {{BlockReceiver}}
did not close in time. 
> Then, after HDFS-3429, a default {{DataChecksum}} (NULL, 512) will be used during transfer.
So when client then tries to recover pipeline after completely transferred, it may encounter
the following exception:
> {noformat}
> java.io.IOException: Client requested checksum DataChecksum(type=CRC32C, chunkSize=4096)
when appending to an existing block with different chunk size: DataChecksum(type=NULL, chunkSize=512)
>         at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.createStreams(ReplicaInPipeline.java:230)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:226)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:798)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:76)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This will repeat, until exhausted by datanode replacement policy.
> Also to note that, with bad luck (like I), 20k clients are all doing this. It's to some
extend a DDoS attack to NameNode (because of getAdditionalDataNode calls).
> I suggest we flush immediately after header is written, preventing anybody from seeing
empty meta file for avoiding the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message