Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 954D99B19 for ; Wed, 29 Feb 2012 22:56:23 +0000 (UTC) Received: (qmail 79696 invoked by uid 500); 29 Feb 2012 22:56:23 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 79659 invoked by uid 500); 29 Feb 2012 22:56:23 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 79651 invoked by uid 99); 29 Feb 2012 22:56:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 22:56:23 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 22:56:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B8D85C8EB8 for ; Wed, 29 Feb 2012 22:55:58 +0000 (UTC) Date: Wed, 29 Feb 2012 22:55:58 +0000 (UTC) From: "Aaron T. Myers (Updated) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <720401220.5234.1330556158758.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <80372957.5224.1330555797131.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-3031) HA: Error (failed to close file) when uploading large file + kill active NN + manual failover MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3031: --------------------------------- Issue Type: Sub-task (was: Bug) Parent: HDFS-1623 > HA: Error (failed to close file) when uploading large file + kill active NN + manual failover > --------------------------------------------------------------------------------------------- > > Key: HDFS-3031 > URL: https://issues.apache.org/jira/browse/HDFS-3031 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha > Affects Versions: HA branch (HDFS-1623) > Reporter: Stephen Chu > Attachments: styx01_killNNfailover, styx01_uploadLargeFile > > > I executed section 3.4 of Todd's HA test plan. https://issues.apache.org/jira/browse/HDFS-1623 > 1. A large file upload is started. > 2. While the file is being uploaded, the administrator kills the first NN and performs a failover. > 3. After the file finishes being uploaded, it is verified for correct length and contents. > For the test, I have a vm_template styx01:/home/schu/centos64-2-5.5.qcow2. styx01 hosted the active NN and styx02 hosted the standby NN. > In the log files I attached, you can see that on styx01 I began file upload. > hadoop fs -put centos64-2.5.5.qcow2 > After waiting several seconds, I kill -9'd the active NN on styx01 and manually failed over to the NN on styx02. I ran into exception below. (rest of the stacktrace in the attached file styx01_uploadLargeFile) > 12/02/29 14:12:52 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. > put: Failed on local exception: java.io.EOFException; Host Details : local host is: "styx01.sf.cloudera.com/172.29.5.192"; destination host is: ""styx01.sf.cloudera.com"\ > :12020; > 12/02/29 14:12:52 ERROR hdfs.DFSClient: Failed to close file /user/schu/centos64-2-5.5.qcow2._COPYING_ > java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "styx01.sf.cloudera.com/172.29.5.192"; destination host is: ""styx01.\ > sf.cloudera.com":12020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) > at org.apache.hadoop.ipc.Client.call(Client.java:1145) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:188) > at $Proxy9.addBlock(Unknown Source) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at $Proxy10.addBlock(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1097) > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:973) > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:455) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:830) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:762) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira