apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Long-running HDFS Write errors
Date Fri, 11 Mar 2016 21:04:43 GMT
Hi Ilya,

Can you please share the log files for this container?

Is the log level set to 'DEBUG'?

Thanks,
Chandni



On Fri, Mar 11, 2016 at 8:57 AM, Chaitanya Chebolu <
chaitanya@datatorrent.com> wrote:

> I think rolling is not happening and this depends on "rollingFile"
> property.
> By default, value of rollingFile = false.
> Property "rollingFile" is true only if one of the below condition
> satisfies:
>
>    - maxLength < Long.MAX_VALUE
>    - rotationWindows > 0.
>
> Please check by setting one of the above properties.
>
> On Fri, Mar 11, 2016 at 9:48 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com>
> wrote:
>
> > This is happening after some time but file roll-over appears to be
> working
> > well with this approach in other instances.
> >
> >
> >
> >
> > On 3/11/16, 8:02 AM, "Sandeep Deshmukh" <sandeep@datatorrent.com> wrote:
> >
> > >Is this happening for the first itself or after some time?
> > >
> > >May be the file is getting rolled over to the next file but as you are
> > >overriding the default file naming policy, the rollover is also trying
> to
> > >write to the same file.
> > >
> > >Regards,
> > >Sandeep
> > >
> > >On Fri, Mar 11, 2016 at 9:21 PM, Ganelin, Ilya <
> > Ilya.Ganelin@capitalone.com>
> > >wrote:
> > >
> > >> I explicitly assign a different name for each partition of the
> operator
> > as
> > >> well based on the context ID.
> > >>
> > >>
> > >>
> > >> On 3/11/16, 7:34 AM, "Sandeep Deshmukh" <sandeep@datatorrent.com>
> > wrote:
> > >>
> > >> >The AbstractFileOutputOperator creates file with timestamp in the
> file
> > >> >name. So, if there is conflict in the name prompts that the same
> > operator
> > >> >could be trying to write to same file.
> > >> >Does this happen after operator recovery or before any other failure
> > >> occurs?
> > >> >
> > >> >Is it possible that multiple partitions write to the same directory?
> > >> >
> > >> >
> > >> >
> > >> >On Fri, Mar 11, 2016 at 7:12 AM, Ganelin, Ilya <
> > >> Ilya.Ganelin@capitalone.com>
> > >> >wrote:
> > >> >
> > >> >> This is 3.0.0.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Sent with Good (www.good.com)
> > >> >> ________________________________
> > >> >> From: Thomas Weise <thomas@datatorrent.com>
> > >> >> Sent: Friday, March 11, 2016 2:02:13 AM
> > >> >> To: dev@apex.incubator.apache.org
> > >> >> Subject: Re: Long-running HDFS Write errors
> > >> >>
> > >> >> Which version of Malhar is this?
> > >> >>
> > >> >>
> > >> >> On Thu, Mar 10, 2016 at 10:56 PM, Ganelin, Ilya <
> > >> >> Ilya.Ganelin@capitalone.com
> > >> >> > wrote:
> > >> >>
> > >> >> > Hello – I have a long-running job which simultaneously
writes to
> > >> >multiple
> > >> >> > files on HDFS. I am seeing the following error come up:
> > >> >> >
> > >> >> > I would appreciate any insight into what’s going on here.
> > >> >> >
> > >> >> >
> > >> >> > Stopped running due to an exception.
> > >> >> > com.google.common.util.concurrent.UncheckedExecutionException:
> > >> >> > java.lang.RuntimeException:
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > >> >> > Failed to create file
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >[/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> > >> >> ]
> > >> >> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client
> > [10.24.28.64],
> > >> >> > because this file is already being created by
> > >> >> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> > >> >> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> > >> >> >         at java.security.AccessController.doPrivileged(Native
> > Method)
> > >> >> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> > >> >> >         at
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> > >> >> >
> > >> >> >         at
> > >> >> >
> > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
> > >> >> >         at
> > >> com.google.common.cache.LocalCache.get(LocalCache.java:3965)
> > >> >> >         at
> > >> >> >
> com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.lib.io.fs.AbstractFileOutputOperator.processTuple(AbstractFileOutputOperator.java:667)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.lib.io.fs.AbstractFileOutputOperator$1.process(AbstractFileOutputOperator.java:236)
> > >> >> >         at
> > >> >> >
> com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:67)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:244)
> > >> >> >         at
> > >> >> >
> com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:226)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1365)
> > >> >> > Caused by: java.lang.RuntimeException:
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > >> >> > Failed to create file
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >[/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> > >> >> ]
> > >> >> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client
> > [10.24.28.64],
> > >> >> > because this file is already being created by
> > >> >> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> > >> >> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> > >> >> >         at java.security.AccessController.doPrivileged(Native
> > Method)
> > >> >> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> > >> >> >         at
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> > >> >> >
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:414)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:334)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> > >> >> >         at
> > >> >> >
> > >>
> >
> >com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> > >> >> >         at
> > >> >> >
> > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> > >> >> >         ... 9 more
> > >> >> > Caused by:
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > >> >> > Failed to create file
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >[/user/vault8/citadel_out/2016_03_20_17_34_339/records/records_I@633.txt.1457678005848.tmp
> > >> >> ]
> > >> >> > for [DFSClient_NONMAPREDUCE_232430238_1207] for client
> > [10.24.28.64],
> > >> >> > because this file is already being created by
> > >> >> > [DFSClient_NONMAPREDUCE_-1482819983_1172] on [10.24.28.64]
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3122)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> > >> >> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> > >> >> >         at
> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> > >> >> >         at java.security.AccessController.doPrivileged(Native
> > Method)
> > >> >> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> > >> >> >         at
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> > >> >> >
> > >> >> >         at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> > >> >> >         at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> > >> >> >         at com.sun.proxy.$Proxy14.append(Unknown Source)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:313)
> > >> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >> >> >         at java.lang.reflect.Method.invoke(Method.java:606)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> > >> >> >         at com.sun.proxy.$Proxy15.append(Unknown Source)
> > >> >> >         at
> > >> >> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1842)
> > >> >> >         at
> > >> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878)
> > >> >> >         at
> > >> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:329)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:325)
> > >> >> >         at
> > >> org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172)
> > >> >> >         at
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >com.datatorrent.lib.io.fs.AbstractFileOutputOperator$3.load(AbstractFileOutputOperator.java:371)
> > >> >> >         ... 14 more
> > >> >> > ________________________________________________________
> > >> >> >
> > >> >> > The information contained in this e-mail is confidential
and/or
> > >> >> > proprietary to Capital One and/or its affiliates and may
only be
> > used
> > >> >> > solely in performance of work or services for Capital One.
The
> > >> >> information
> > >> >> > transmitted herewith is intended only for use by the individual
> or
> > >> >entity
> > >> >> > to which it is addressed. If the reader of this message is
not
> the
> > >> >> intended
> > >> >> > recipient, you are hereby notified that any review,
> retransmission,
> > >> >> > dissemination, distribution, copying or other use of, or
taking
> of
> > any
> > >> >> > action in reliance upon this information is strictly prohibited.
> If
> > >> you
> > >> >> > have received this communication in error, please contact
the
> > sender
> > >> and
> > >> >> > delete the material from your computer.
> > >> >> >
> > >> >> ________________________________________________________
> > >> >>
> > >> >> The information contained in this e-mail is confidential and/or
> > >> >> proprietary to Capital One and/or its affiliates and may only
be
> used
> > >> >> solely in performance of work or services for Capital One. The
> > >> information
> > >> >> transmitted herewith is intended only for use by the individual
or
> > >> entity
> > >> >> to which it is addressed. If the reader of this message is not
the
> > >> >intended
> > >> >> recipient, you are hereby notified that any review, retransmission,
> > >> >> dissemination, distribution, copying or other use of, or taking
of
> > any
> > >> >> action in reliance upon this information is strictly prohibited.
If
> > you
> > >> >> have received this communication in error, please contact the
> sender
> > and
> > >> >> delete the material from your computer.
> > >> >>
> > >> ________________________________________________________
> > >>
> > >> The information contained in this e-mail is confidential and/or
> > >> proprietary to Capital One and/or its affiliates and may only be used
> > >> solely in performance of work or services for Capital One. The
> > information
> > >> transmitted herewith is intended only for use by the individual or
> > entity
> > >> to which it is addressed. If the reader of this message is not the
> > intended
> > >> recipient, you are hereby notified that any review, retransmission,
> > >> dissemination, distribution, copying or other use of, or taking of any
> > >> action in reliance upon this information is strictly prohibited. If
> you
> > >> have received this communication in error, please contact the sender
> and
> > >> delete the material from your computer.
> > >>
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message