Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5B6A10036 for ; Fri, 11 Mar 2016 15:24:24 +0000 (UTC) Received: (qmail 47737 invoked by uid 500); 11 Mar 2016 15:24:24 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 47670 invoked by uid 500); 11 Mar 2016 15:24:24 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 47658 invoked by uid 99); 11 Mar 2016 15:24:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 15:24:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 05C48C0DAB for ; Fri, 11 Mar 2016 15:24:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 9S8jIEHRR6UX for ; Fri, 11 Mar 2016 15:24:19 +0000 (UTC) Received: from mail-ob0-f181.google.com (mail-ob0-f181.google.com [209.85.214.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D6AA05F640 for ; Fri, 11 Mar 2016 15:24:18 +0000 (UTC) Received: by mail-ob0-f181.google.com with SMTP id ts10so116293698obc.1 for ; Fri, 11 Mar 2016 07:24:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=dsJoZsnW/LOGmXNgOJq+aIzi3pW9+SnWRmvCtTMKmOQ=; b=PWW/GrKy77vOtB1kxlsv2+qLUm7RXFAhBjSp4fXl7M+71uUkG79aqAz1V/huee3QIG d5/35ImToKFLSBMCYSJpqi6sqV6wuKrxed2SYW27NcHz7J48GhZT4rTjGsqjkZwG8z8V oHkVr7aI/qGWDnV3KRpPml614pejPXVDZFsft1z+ApEnzKUglm+/l6U7W+4CoPnJsMEM f8scr+FuNnPnauXj6vdqNY2VN6Hl/Lw33kX/95LdUY62rouLbxgNeqaKvSnKFm9ijNAe 2axip+71ZM76/yI2sDO2P3gSFvIdvHakhpIrTczcxhRvatvn6ZOsFHjhh4Gk+83H8cEe Y/Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=dsJoZsnW/LOGmXNgOJq+aIzi3pW9+SnWRmvCtTMKmOQ=; b=k6a0NrNx0+wUXAZyoRUpx9IzoVT9cXHKq/8o6YQ4wfvTIMYYngPS8NlzpMdQGB5MEQ YEls7cCl3EtMtz4fV/tUVofopQXk7cMNCUyP94o1PCfp595emLJvNBaEtFKWqC+BPqiD jx2php8GgBb7NDFtG1AQYR+zmL0+CqTk0j+m+tIKu9AdhyKQ9cC0z/kNIhR21/7WIY/m VZgbQR1dRfTzcG7sN2hq+hxnebSfg6L3KCKw1HQOy3WPOlQ0GQIc3SDG6uDUWqtFoJ0D HnwT7taziigVGJ9HNcX7vF3/c7c6l65WD+8VlQyjyOC0OSz4er5hAIsftsP+7VlKZ4gV 5Jmg== X-Gm-Message-State: AD7BkJJ8ivhGP9gmvc3zCNuWfyyoMhLostrHoOWJeiO1PDZOoJu2ZcpMPBtSjiNhlgCXokYTUpvHV2JiaJWnb/Cs MIME-Version: 1.0 X-Received: by 10.182.76.2 with SMTP id g2mr5951240obw.21.1457709851952; Fri, 11 Mar 2016 07:24:11 -0800 (PST) Received: by 10.202.75.9 with HTTP; Fri, 11 Mar 2016 07:24:11 -0800 (PST) In-Reply-To: References: Date: Fri, 11 Mar 2016 07:24:11 -0800 Message-ID: Subject: Re: Long-running HDFS Write errors From: Thomas Weise To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c1a7dae9798b052dc78535 --001a11c1a7dae9798b052dc78535 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Does this happen after operator recovery or before any other failure occurs= ? Is it possible that multiple partitions write to the same directory? On Fri, Mar 11, 2016 at 7:12 AM, Ganelin, Ilya wrote: > This is 3.0.0. > > > > Sent with Good (www.good.com) > ________________________________ > From: Thomas Weise > Sent: Friday, March 11, 2016 2:02:13 AM > To: dev@apex.incubator.apache.org > Subject: Re: Long-running HDFS Write errors > > Which version of Malhar is this? > > > On Thu, Mar 10, 2016 at 10:56 PM, Ganelin, Ilya < > Ilya.Ganelin@capitalone.com > > wrote: > > > Hello =E2=80=93 I have a long-running job which simultaneously writes t= o multiple > > files on HDFS. I am seeing the following error come up: > > > > I would appreciate any insight into what=E2=80=99s going on here. > > > > > > Stopped running due to an exception. > > com.google.common.util.concurrent.UncheckedExecutionException: > > java.lang.RuntimeException: > > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > > Failed to create file > > > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp > ] > > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > > because this file is already being created by > > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > > at > > > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > > at > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > > at > > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > > > at > > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234) > > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > > at > > com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) > > at > > > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4= 829) > > at > > > com.datatorrent.lib.io.fs.AbstractFileOutputOperator.processTuple(Abstrac= tFileOutputOperator.java:667) > > at > > > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$1.process(AbstractFi= leOutputOperator.java:236) > > at > > com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:67) > > at > > > com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep= (BufferServerSubscriber.java:244) > > at > > com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:226) > > at > > > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.= java:1365) > > Caused by: java.lang.RuntimeException: > > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > > Failed to create file > > > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp > ] > > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > > because this file is already being created by > > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > > at > > > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > > at > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > > at > > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > > > at > > > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:414) > > at > > > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:334) > > at > > > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Local= Cache.java:3568) > > at > > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:235= 0) > > at > > > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.jav= a:2313) > > at > > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > > ... 9 more > > Caused by: > > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.Alr= eadyBeingCreatedException): > > Failed to create file > > > [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.= 1457678005848.tmp > ] > > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64], > > because this file is already being created by > > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64] > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(= FSNamesystem.java:3122) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FS= Namesystem.java:2905) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNames= ystem.java:3186) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesyst= em.java:3149) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeR= pcServer.java:611) > > at > > > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr= otocol.append(AuthorizationProviderProxyClientProtocol.java:124) > > at > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > > at > > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:617) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1671) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > > at > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.= java:230) > > at com.sun.proxy.$Proxy14.append(Unknown Source) > > at > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.appe= nd(ClientNamenodeProtocolTranslatorPB.java:313) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :57) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoc= ationHandler.java:252) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationH= andler.java:104) > > at com.sun.proxy.$Proxy15.append(Unknown Source) > > at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1842) > > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878) > > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSyst= em.java:329) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSyst= em.java:325) > > at > > > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolve= r.java:81) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem= .java:325) > > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172) > > at > > > com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileO= utputOperator.java:371) > > ... 14 more > > ________________________________________________________ > > > > The information contained in this e-mail is confidential and/or > > proprietary to Capital One and/or its affiliates and may only be used > > solely in performance of work or services for Capital One. The > information > > transmitted herewith is intended only for use by the individual or enti= ty > > to which it is addressed. If the reader of this message is not the > intended > > recipient, you are hereby notified that any review, retransmission, > > dissemination, distribution, copying or other use of, or taking of any > > action in reliance upon this information is strictly prohibited. If you > > have received this communication in error, please contact the sender an= d > > delete the material from your computer. > > > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The informatio= n > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intend= ed > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > --001a11c1a7dae9798b052dc78535--