apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Long-running HDFS Write errors
Date Fri, 11 Mar 2016 15:24:11 GMT
Does this happen after operator recovery or before any other failure occurs?

Is it possible that multiple partitions write to the same directory?



On Fri, Mar 11, 2016 at 7:12 AM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com>
wrote:

> This is 3.0.0.
>
>
>
> Sent with Good (www.good.com)
> ________________________________
> From: Thomas Weise <thomas@datatorrent.com>
> Sent: Friday, March 11, 2016 2:02:13 AM
> To: dev@apex.incubator.apache.org
> Subject: Re: Long-running HDFS Write errors
>
> Which version of Malhar is this?
>
>
> On Thu, Mar 10, 2016 at 10:56 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com
> > wrote:
>
> > Hello – I have a long-running job which simultaneously writes to multiple
> > files on HDFS. I am seeing the following error come up:
> >
> > I would appreciate any insight into what’s going on here.
> >
> >
> > Stopped running due to an exception.
> > com.google.common.util.concurrent.UncheckedExecutionException:
> > java.lang.RuntimeException:
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > Failed to create file
> >
> [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> ]
> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64],
> > because this file is already being created by
> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> >
> >         at
> > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
> >         at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
> >         at
> > com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
> >         at
> >
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
> >         at
> >
> com.datatorrent.lib.io.fs.AbstractFileOutputOperator.processTuple(AbstractFileOutputOperator.java:667)
> >         at
> >
> com.datatorrent.lib.io.fs.AbstractFileOutputOperator$1.process(AbstractFileOutputOperator.java:236)
> >         at
> > com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:67)
> >         at
> >
> com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:244)
> >         at
> > com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:226)
> >         at
> >
> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1365)
> > Caused by: java.lang.RuntimeException:
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > Failed to create file
> >
> [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> ]
> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64],
> > because this file is already being created by
> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> >
> >         at
> >
> com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:414)
> >         at
> >
> com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:334)
> >         at
> >
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> >         at
> > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> >         at
> >
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> >         at
> > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> >         ... 9 more
> > Caused by:
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > Failed to create file
> >
> [/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> ]
> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client [10.24.28.64],
> > because this file is already being created by
> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> >         at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> >         at com.sun.proxy.$Proxy14.append(Unknown Source)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:313)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> >         at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> >         at com.sun.proxy.$Proxy15.append(Unknown Source)
> >         at
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1842)
> >         at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878)
> >         at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:329)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325)
> >         at
> >
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:325)
> >         at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172)
> >         at
> >
> com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:371)
> >         ... 14 more
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message