flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Flink job on secure Yarn fails after many hours
Date Fri, 04 Dec 2015 09:40:38 GMT
Hi Maximilian,

I just downloaded the version from your google drive and used that to run
my test topology that accesses HBase.
I deliberately started it twice to double the chance to run into this
situation.

I'll keep you posted.

Niels


On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Niels,
>
> Just got back from our CI. The build above would fail with a
> Checkstyle error. I corrected that. Also I have built the binaries for
> your Hadoop version 2.6.0.
>
> Binaries:
>
>
> https://drive.google.com/file/d/0BziY9U_qva1sZ1FVR3RWeVNrNzA/view?usp=sharing
>
> Source:
>
> https://github.com/mxm/flink/tree/kerberos-yarn-heartbeat-fail-0.10.1
>
> git fetch https://github.com/mxm/flink/ \
> kerberos-yarn-heartbeat-fail-0.10.1 && git checkout FETCH_HEAD
>
>
> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip
>
> Thanks,
> Max
>
> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <mxm@apache.org> wrote:
> > I forgot you're using Flink 0.10.1. The above was for the master.
> >
> > So here's the commit for Flink 0.10.1:
> >
> https://github.com/mxm/flink/commit/a41f3866f4097586a7b2262093088861b62930cd
> >
> > git fetch https://github.com/mxm/flink/ \
> > a41f3866f4097586a7b2262093088861b62930cd && git checkout FETCH_HEAD
> >
> >
> https://github.com/mxm/flink/archive/a41f3866f4097586a7b2262093088861b62930cd.zip
> >
> > Thanks,
> > Max
> >
> > On Wed, Dec 2, 2015 at 6:39 PM, Maximilian Michels <mxm@apache.org>
> wrote:
> >> Great. Here is the commit to try out:
> >>
> https://github.com/mxm/flink/commit/f49b9635bec703541f19cb8c615f302a07ea88b3
> >>
> >> If you already have the Flink repository, check it out using
> >>
> >> git fetch https://github.com/mxm/flink/
> >> f49b9635bec703541f19cb8c615f302a07ea88b3 && git checkout FETCH_HEAD
> >>
> >> Alternatively, here's a direct download link to the sources with the
> >> fix included:
> >>
> https://github.com/mxm/flink/archive/f49b9635bec703541f19cb8c615f302a07ea88b3.zip
> >>
> >> Thanks a lot,
> >> Max
> >>
> >> On Wed, Dec 2, 2015 at 5:44 PM, Niels Basjes <Niels@basjes.nl> wrote:
> >>> Sure, just give me the git repo url to build and I'll give it a try.
> >>>
> >>> Niels
> >>>
> >>> On Wed, Dec 2, 2015 at 4:28 PM, Maximilian Michels <mxm@apache.org>
> wrote:
> >>>>
> >>>> I mentioned that the exception gets thrown when requesting container
> >>>> status information. We need this to send a heartbeat to YARN but it
is
> >>>> not very crucial if this fails once for the running job. Possibly, we
> >>>> could work around this problem by retrying N times in case of an
> >>>> exception.
> >>>>
> >>>> Would it be possible for you to deploy a custom Flink 0.10.1 version
> >>>> we provide and test again?
> >>>>
> >>>> On Wed, Dec 2, 2015 at 4:03 PM, Niels Basjes <Niels@basjes.nl>
wrote:
> >>>> > No, I was just asking.
> >>>> > No upgrade is possible for the next month or two.
> >>>> >
> >>>> > This week is our busiest day of the year ...
> >>>> > Our shop is doing about 10 orders per second these days ...
> >>>> >
> >>>> > So they won't upgrade until next January/February
> >>>> >
> >>>> > Niels
> >>>> >
> >>>> > On Wed, Dec 2, 2015 at 3:59 PM, Maximilian Michels <mxm@apache.org>
> >>>> > wrote:
> >>>> >>
> >>>> >> Hi Niels,
> >>>> >>
> >>>> >> You mentioned you have the option to update Hadoop and redeploy
the
> >>>> >> job. Would be great if you could do that and let us know how
it
> turns
> >>>> >> out.
> >>>> >>
> >>>> >> Cheers,
> >>>> >> Max
> >>>> >>
> >>>> >> On Wed, Dec 2, 2015 at 3:45 PM, Niels Basjes <Niels@basjes.nl>
> wrote:
> >>>> >> > Hi,
> >>>> >> >
> >>>> >> > I posted the entire log from the first log line at the
moment of
> >>>> >> > failure
> >>>> >> > to
> >>>> >> > the very end of the logfile.
> >>>> >> > This is all I have.
> >>>> >> >
> >>>> >> > As far as I understand the Kerberos and Keytab mechanism
in
> Hadoop
> >>>> >> > Yarn
> >>>> >> > is
> >>>> >> > that it catches the "Invalid Token" and then (if keytab)
gets a
> new
> >>>> >> > Kerberos
> >>>> >> > ticket (or tgt?).
> >>>> >> > When the new ticket has been obtained it retries the call
that
> >>>> >> > previously
> >>>> >> > failed.
> >>>> >> > To me it seemed that this call can fail over the invalid
Token
> yet it
> >>>> >> > cannot
> >>>> >> > be retried.
> >>>> >> >
> >>>> >> > At this moment I'm thinking a bug in Hadoop.
> >>>> >> >
> >>>> >> > Niels
> >>>> >> >
> >>>> >> > On Wed, Dec 2, 2015 at 2:51 PM, Maximilian Michels <
> mxm@apache.org>
> >>>> >> > wrote:
> >>>> >> >>
> >>>> >> >> Hi Niels,
> >>>> >> >>
> >>>> >> >> Sorry for hear you experienced this exception. From
a first
> glance,
> >>>> >> >> it
> >>>> >> >> looks like a bug in Hadoop to me.
> >>>> >> >>
> >>>> >> >> > "Not retrying because the invoked method is not
idempotent,
> and
> >>>> >> >> > unable
> >>>> >> >> > to determine whether it was invoked"
> >>>> >> >>
> >>>> >> >> That is nothing to worry about. This is Hadoop's internal
retry
> >>>> >> >> mechanism that re-attempts to do actions which previously
> failed if
> >>>> >> >> that's possible. Since the action is not idempotent
(it cannot
> be
> >>>> >> >> executed again without risking to change the state
of the
> execution)
> >>>> >> >> and it also doesn't track its execution states, it
won't be
> retried
> >>>> >> >> again.
> >>>> >> >>
> >>>> >> >> The main issue is this exception:
> >>>> >> >>
> >>>> >> >> >org.apache.hadoop.security.token.SecretManager$InvalidToken:
> >>>> >> >> > Invalid
> >>>> >> >> > AMRMToken from >appattempt_1443166961758_163901_000001
> >>>> >> >>
> >>>> >> >> From the stack trace it is clear that this exception
occurs upon
> >>>> >> >> requesting container status information from the Resource
> Manager:
> >>>> >> >>
> >>>> >> >> >at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
> >>>> >> >>
> >>>> >> >> Are there any more exceptions in the log? Do you have
the
> complete
> >>>> >> >> logs available and could you share them?
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> Best regards,
> >>>> >> >> Max
> >>>> >> >>
> >>>> >> >> On Wed, Dec 2, 2015 at 11:47 AM, Niels Basjes <Niels@basjes.nl>
> >>>> >> >> wrote:
> >>>> >> >> > Hi,
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > We have a Kerberos secured Yarn cluster here
and I'm
> experimenting
> >>>> >> >> > with
> >>>> >> >> > Apache Flink on top of that.
> >>>> >> >> >
> >>>> >> >> > A few days ago I started a very simple Flink
application (just
> >>>> >> >> > stream
> >>>> >> >> > the
> >>>> >> >> > time as a String into HBase 10 times per second).
> >>>> >> >> >
> >>>> >> >> > I (deliberately) asked our IT-ops guys to make
my account
> have a
> >>>> >> >> > max
> >>>> >> >> > ticket
> >>>> >> >> > time of 5 minutes and a max renew time of 10
minutes (yes,
> >>>> >> >> > ridiculously
> >>>> >> >> > low
> >>>> >> >> > timeout values because I needed to validate this
> >>>> >> >> > https://issues.apache.org/jira/browse/FLINK-2977).
> >>>> >> >> >
> >>>> >> >> > This job is started with a keytab file and after
running for
> 31
> >>>> >> >> > hours
> >>>> >> >> > it
> >>>> >> >> > suddenly failed with the exception you see below.
> >>>> >> >> >
> >>>> >> >> > I had the same job running for almost 400 hours
until that
> failed
> >>>> >> >> > too
> >>>> >> >> > (I
> >>>> >> >> > was
> >>>> >> >> > too late to check the logfiles but I suspect
the same
> problem).
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > So in that time span my tickets have expired
and new tickets
> have
> >>>> >> >> > been
> >>>> >> >> > obtained several hundred times.
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > The main error I see is that in the process of
a ticket
> expiring
> >>>> >> >> > and
> >>>> >> >> > being
> >>>> >> >> > renewed I see this message:
> >>>> >> >> >
> >>>> >> >> >      Not retrying because the invoked method
is not
> idempotent,
> >>>> >> >> > and
> >>>> >> >> > unable
> >>>> >> >> > to determine whether it was invoked
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Yarn on the cluster is 2.6.0 ( HDP 2.6.0.2.2.4.2-2
)
> >>>> >> >> >
> >>>> >> >> > Flink is version 0.10.1
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > How do I fix this?
> >>>> >> >> > Is this a bug (in either Hadoop or Flink) or
am I doing
> something
> >>>> >> >> > wrong?
> >>>> >> >> > Would upgrading Yarn to 2.7.1  (i.e. HDP 2.3)
fix this?
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Niels Basjes
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > 21:30:27,821 WARN
> org.apache.hadoop.security.UserGroupInformation
> >>>> >> >> > - PriviledgedActionException as:nbasjes (auth:SIMPLE)
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> >>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> > 21:30:27,861 WARN  org.apache.hadoop.ipc.Client
> >>>> >> >> > - Exception encountered while connecting to the
server :
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> >>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> > 21:30:27,861 WARN
> org.apache.hadoop.security.UserGroupInformation
> >>>> >> >> > - PriviledgedActionException as:nbasjes (auth:SIMPLE)
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> >>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> > 21:30:27,891 WARN
> >>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler
> >>>> >> >> > - Exception while invoking class
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate.
> >>>> >> >> > Not retrying because the invoked method is not
idempotent, and
> >>>> >> >> > unable
> >>>> >> >> > to
> >>>> >> >> > determine whether it was invoked
> >>>> >> >> > org.apache.hadoop.security.token.SecretManager$InvalidToken:
> >>>> >> >> > Invalid
> >>>> >> >> > AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> >       at
> >>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> >>>> >> >> > Method)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >>>> >> >> >       at
> >>>> >> >> >
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
> >>>> >> >> >       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
> >>>> >> >> > Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> >> >> >       at java.lang.reflect.Method.invoke(Method.java:606)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> >>>> >> >> >       at com.sun.proxy.$Proxy14.allocate(Unknown
Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:245)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
> >>>> >> >> >       at
> >>>> >> >> > scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> >>>> >> >> >       at
> >>>> >> >> >
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> >>>> >> >> >       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
> >>>> >> >> >       at
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>> >> >> >       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> >>>> >> >> >       at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>> >> >> >       at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>> >> >> >       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>> >> >> > Caused by:
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> >>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1406)
> >>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> >>>> >> >> >       at com.sun.proxy.$Proxy13.allocate(Unknown
Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> >>>> >> >> >       ... 29 more
> >>>> >> >> > 21:30:27,943 ERROR akka.actor.OneForOneStrategy
> >>>> >> >> > - Invalid AMRMToken from
> appattempt_1443166961758_163901_000001
> >>>> >> >> > org.apache.hadoop.security.token.SecretManager$InvalidToken:
> >>>> >> >> > Invalid
> >>>> >> >> > AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> >       at
> >>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> >>>> >> >> > Method)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >>>> >> >> >       at
> >>>> >> >> >
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
> >>>> >> >> >       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
> >>>> >> >> > Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> >> >> >       at java.lang.reflect.Method.invoke(Method.java:606)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> >>>> >> >> >       at com.sun.proxy.$Proxy14.allocate(Unknown
Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:245)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
> >>>> >> >> >       at
> >>>> >> >> > scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> >>>> >> >> >       at
> >>>> >> >> >
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> >>>> >> >> >       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
> >>>> >> >> >       at
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>> >> >> >       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> >>>> >> >> >       at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>> >> >> >       at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>> >> >> >       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>> >> >> > Caused by:
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> >>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
> >>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1406)
> >>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> >>>> >> >> >       at com.sun.proxy.$Proxy13.allocate(Unknown
Source)
> >>>> >> >> >       at
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> >>>> >> >> >       ... 29 more
> >>>> >> >> > 21:30:28,075 INFO  org.apache.flink.yarn.YarnJobManager
> >>>> >> >> > - Stopping JobManager
> >>>> >> >> > akka.tcp://flink@10.10.200.3:39527/user/jobmanager.
> >>>> >> >> > 21:30:28,088 INFO
> >>>> >> >> > org.apache.flink.runtime.executiongraph.ExecutionGraph
> >>>> >> >> > - Source: Custom Source -> Sink: Unnamed (1/1)
> >>>> >> >> > (db0d95c11c14505827e696eec7efab94) switched from
RUNNING to
> >>>> >> >> > CANCELING
> >>>> >> >> > 21:30:28,113 INFO
> >>>> >> >> > org.apache.flink.runtime.executiongraph.ExecutionGraph
> >>>> >> >> > - Source: Custom Source -> Sink: Unnamed (1/1)
> >>>> >> >> > (db0d95c11c14505827e696eec7efab94) switched from
CANCELING to
> >>>> >> >> > FAILED
> >>>> >> >> > 21:30:28,184 INFO  org.apache.flink.runtime.blob.BlobServer
> >>>> >> >> > - Stopped BLOB server at 0.0.0.0:41281
> >>>> >> >> > 21:30:28,185 ERROR
> org.apache.flink.runtime.jobmanager.JobManager
> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912
terminated,
> >>>> >> >> > stopping
> >>>> >> >> > process...
> >>>> >> >> > 21:30:28,286 INFO
> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
> >>>> >> >> > - Removing web root dir
> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > --
> >>>> >> >> > Best regards / Met vriendelijke groeten,
> >>>> >> >> >
> >>>> >> >> > Niels Basjes
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > --
> >>>> >> > Best regards / Met vriendelijke groeten,
> >>>> >> >
> >>>> >> > Niels Basjes
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Best regards / Met vriendelijke groeten,
> >>>> >
> >>>> > Niels Basjes
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards / Met vriendelijke groeten,
> >>>
> >>> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
View raw message