hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
Date Fri, 11 Oct 2019 13:25:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949441#comment-16949441
] 

Hadoop QA commented on HDFS-14646:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 17s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 56s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 15s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 51s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 18s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m  8s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 45s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 24s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 13s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  9s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  9s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 45s{color}
| {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 75 unchanged
- 12 fixed = 75 total (was 87) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 13s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 25s{color}
| {color:green} patch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 59s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 30s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 34s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 36s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 39s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.datanode.TestCorruptMetadataFile |
|   | hadoop.hdfs.server.namenode.TestTransferFsImage |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14646 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982752/HDFS-14646.001.patch
|
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit
 shadedclient  findbugs  checkstyle  |
| uname | Linux 193bc2b0c866 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f267917 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28062/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28062/testReport/ |
| Max. process+thread count | 2754 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28062/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> ------------------------------------------------------------------------
>
>                 Key: HDFS-14646
>                 URL: https://issues.apache.org/jira/browse/HDFS-14646
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.1.2
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Major
>              Labels: multi-sbnn
>         Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image
to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately
replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE,
etc.), the local SNN will not terminate the put process immediately, but will put the FsImage
completely to the peer NN, and will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different consequences : 
> *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP connection
will still be established, and the data SNN sent will be read by Jetty framework itself in
the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously,
causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage
can often reach about 30GB, This is indeed a big waste.
> *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP connection
will be auto closed, and then SNN will directly get an "Error writing request body to server"
exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240,
fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240,
fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) 
{code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans
to put a FsImage to the peer NN, he need to check whether he really need to put it at this
time.
> In detail, local SNN should establish an HTTP connection with the peer NN, send the put
request, and then immediately read the response (this is the key point). If the peer NN does
not reply an HTTP_OK, it means the local SNN should not put image at this time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message