Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1403418FB4 for ; Fri, 11 Mar 2016 15:13:18 +0000 (UTC) Received: (qmail 22978 invoked by uid 500); 11 Mar 2016 15:13:17 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 22912 invoked by uid 500); 11 Mar 2016 15:13:17 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 22900 invoked by uid 99); 11 Mar 2016 15:13:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 15:13:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CE1681A04EA for ; Fri, 11 Mar 2016 15:13:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.731 X-Spam-Level: X-Spam-Status: No, score=-0.731 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.329, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=capitalone.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BcKV4wUNPC6C for ; Fri, 11 Mar 2016 15:13:12 +0000 (UTC) Received: from komail01.capitalone.com (outk.capitalone.com [199.244.214.13]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id CF23C5FACB for ; Fri, 11 Mar 2016 15:13:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=capitalone.com; l=30782; q=dns/txt; s=SM2048Mar2015K; t=1457709191; x=1457795591; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=xu394oBqGqbFYJ9SZi4PiKCyjjF+Lc210JDFArqM07Q=; b=MIPoAs4/MjI97LrKGLdXh6WdJWwt0ETafqxIWR+g+21a2AFPOylxWJcK nyD/IvXgr+MOT+xOB1fWQN1EkQAnQ7UFsB+OHV3yFk/P4YIObXjjPcqpB evVgkMNfTtRJVIYlzeybPpeDSNxz63VJiBg6qxo0EgNlyOvMWC8Qj6ubi hu/ckvZd2UR6aL0a6/QmTb3t/SZXImiccpBU+MJ9rZCok98wErL+WRzjT YezsRnPmj00oduPQ8KkqD9qEDHasXSkiUzhB7sc2puVdgafaC9bMNdVGC WKkg7kidZltxMuNMZM7dfZDIdq/eIcL63LN58TUfH0jwWTERMdqeC/JZu g==; X-IronPort-AV: E=McAfee;i="5700,7163,8100"; a="334768176" X-IronPort-AV: E=Sophos;i="5.24,320,1454994000"; d="scan'208,217";a="334768176" X-AFNotificationDomain: TRUE X-HTML-Disclaimer: True Received: from kdcpexcasht05.cof.ds.capitalone.com ([10.37.194.49]) by komail01.kdc.capitalone.com with ESMTP/TLS/AES128-SHA; 11 Mar 2016 10:13:04 -0500 Received: from MDCPEXHYB04.cof.ds.capitalone.com (10.24.58.44) by KDCPEXCASHT05.cof.ds.capitalone.com (10.37.194.49) with Microsoft SMTP Server (TLS) id 8.3.389.2; Fri, 11 Mar 2016 10:13:03 -0500 Received: from na01-by2-obe.outbound.protection.outlook.com (207.46.163.236) by autodiscover.capitalone.com (10.24.58.44) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 11 Mar 2016 10:13:03 -0500 Received: from BN3P103MB0003.NAMP103.PROD.OUTLOOK.COM (141.251.172.83) by BN3P103MB0004.NAMP103.PROD.OUTLOOK.COM (141.251.172.84) with Microsoft SMTP Server (TLS) id 15.1.434.16; Fri, 11 Mar 2016 15:13:00 +0000 Received: from BN3P103MB0003.NAMP103.PROD.OUTLOOK.COM ([141.251.172.83]) by BN3P103MB0003.NAMP103.PROD.OUTLOOK.COM ([141.251.172.83]) with mapi id 15.01.0434.016; Fri, 11 Mar 2016 15:12:59 +0000 From: "Ganelin, Ilya" To: "dev@apex.incubator.apache.org" Subject: RE: Long-running HDFS Write errors Thread-Topic: Long-running HDFS Write errors Thread-Index: AQHRe2MlMch59W52AkK5Sa3PZkXjYp9T0RmAgACJH+k= Date: Fri, 11 Mar 2016 15:12:59 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: apex.incubator.apache.org; dkim=none (message not signed) header.d=none;apex.incubator.apache.org; dmarc=none action=none header.from=capitalone.com; x-originating-ip: [204.63.37.2] x-ms-office365-filtering-correlation-id: f9723e9b-da5c-4f55-b275-08d349bfa220 x-microsoft-exchange-diagnostics: 1;BN3P103MB0004;5:++1GRyn0nMmPjJuNd2yBRlnogC7lTMW5IlMszvw3J1P07lpSKv8p1nttxy4vxUfOiJ+yq3SkA4ZOyhjWC5oj7qxX1s9kr6tcaWi3cfRmbo0f5cEmDLr8bRmrQuvSbXtVcVGp11zG4h5rruI9Iw2Ldg==;24:SmmvY78/zN7DtauMdI03Eu3ol4yAa8DgCge1LDr0L8Sw309b+5zlQfmDknAbZPN3x2OnGEvlJ9WjipQqGyqMrGOsAmFlChP1hqk7QVYLSLY=;20:1PTjbHdDauHD/30dz12F+67FaQmcXaM6uk/e/Lyw8fTFNkzz4/LEBZuPIr+9er4adz84wa1eQJ/C+CjZKoUZgNxAU/r9LA+IOrzV1Q2jfz+fraGIQkZuDacnUAnoYRlr0qZVIy8JoBS3/Z6iQXVO2dO265W9X/vWIvtegA+CucZvU4FDf/37N66kEsBgC36jotz1wRk3IwoCi2jbKuBTi9KlF7Ir/+5Su+tXyK2gHmBulFtVKW7VPkCoz4a6I8Ekq9pypmIwC3rD3ylvQLtlxsRAV0jxa4WUL0it2147HY0Cgbj+YgGgxLCnGUVLSscbQYQjEhULFosV+RHnS1riVEknDXvSEOiu1xB+ovBgXMXTsquFARBkVR5TJLaK5+PHXjPMnxyP/RbKgFKwyi+7Uwf0+Q9uZCbwHYeaqik7NqDfsWOHHHuNYAzLIIQSUq7Pd6fKBdMDRvAnLmPOAN6UXEus9+wmbIMCg7du8qQXPR9i4P5G4uD+UIvwN1zwwPUB x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN3P103MB0004; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(2401047)(5005006)(8121501046)(3002001)(10201501046);SRVR:BN3P103MB0004;BCL:0;PCL:0;RULEID:;SRVR:BN3P103MB0004; x-forefront-prvs: 087894CD3C x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(24454002)(377454003)(365934003)(50986999)(76176999)(102836003)(19580395003)(19625215002)(54356999)(2906002)(5004730100002)(86362001)(5002640100001)(106116001)(81166005)(2351001)(66066001)(450100001)(87936001)(19580405001)(1096002)(33656002)(92566002)(107886002)(5008740100001)(189998001)(110136002)(1220700001)(2900100001)(2950100001)(2501003)(3846002)(6116002)(16236675004)(3660700001)(122556002)(3480700003)(10290500002)(3280700002)(586003)(5003600100002)(10400500002);DIR:OUT;SFP:1101;SCL:1;SRVR:BN3P103MB0004;H:BN3P103MB0003.NAMP103.PROD.OUTLOOK.COM;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_BN3P103MB00034A6C4EA8C9926778C7CDE6B50BN3P103MB0003NAMP_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Mar 2016 15:12:59.4817 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: c4bbfe29-1e89-4a06-9144-923b36afc949 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3P103MB0004 X-OriginatorOrg: capitalone.com --_000_BN3P103MB00034A6C4EA8C9926778C7CDE6B50BN3P103MB0003NAMP_ MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable This is 3.0.0. Sent with Good (www.good.com) ________________________________ From: Thomas Weise Sent: Friday, March 11, 2016 2:02:13 AM To: dev@apex.incubator.apache.org Subject: Re: Long-running HDFS Write errors Which version of Malhar is this? On Thu, Mar 10, 2016 at 10:56 PM, Ganelin, Ilya wrote: > Hello =96 I have a long-running job which simultaneously writes to multip= le > files on HDFS. I am seeing the following error come up: > > I would appreciate any insight into what=92s going on here. > > > Stopped running due to an exception. > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > Failed to create file > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp] > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > because this file is already being created by > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4= 829) > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator.processTuple(Abstrac= tFileOutputOperator.java:667) > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$1.process(AbstractFi= leOutputOperator.java:236) > at > com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:67) > at > com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep= (BufferServerSubscriber.java:244) > at > com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:226) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.= java:1365) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > Failed to create file > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp] > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > because this file is already being created by > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:414) > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:334) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Local= Cache.java:3568) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.jav= a:2313) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > ... 9 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > Failed to create file > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp] > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > because this file is already being created by > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.= java:230) > at com.sun.proxy.$Proxy14.append(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.appe= nd(ClientNamenodeProtocolTranslatorPB.java:313) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoc= ationHandler.java:252) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationH= andler.java:104) > at com.sun.proxy.$Proxy15.append(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:184= 2) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSyst= em.java:329) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSyst= em.java:325) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolve= r.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem= .java:325) > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172) > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:371) > ... 14 more > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intend= ed > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary= to Capital One and/or its affiliates and may only be used solely in perfor= mance of work or services for Capital One. The information transmitted here= with is intended only for use by the individual or entity to which it is ad= dressed. If the reader of this message is not the intended recipient, you a= re hereby notified that any review, retransmission, dissemination, distribu= tion, copying or other use of, or taking of any action in reliance upon thi= s information is strictly prohibited. If you have received this communicati= on in error, please contact the sender and delete the material from your co= mputer. --_000_BN3P103MB00034A6C4EA8C9926778C7CDE6B50BN3P103MB0003NAMP_--