flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Flink job on secure Yarn fails after many hours
Date Wed, 12 Apr 2017 15:35:30 GMT
Niels, are you still facing this issue?

As far as I understood it, the security changes in Flink 1.2.0 use a new
Kerberos mechanism that allows infinite token renewal.

On Thu, Mar 17, 2016 at 7:30 AM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Niels,
>
> Thanks for the feedback. As far as I know, Hadoop deliberately
> defaults to the one week maximum life time of delegation tokens. Have
> you tried increasing the maximum token life time or was that not an
> option?
>
> I wonder why do you use a while loop? Would it be possible to use the
> Yarn failover mechanism which starts a new ApplicationMaster and
> resubmits the job?
>
> Thanks,
> Max
>
>
> On Thu, Mar 17, 2016 at 12:43 PM, Niels Basjes <Niels@basjes.nl> wrote:
> > Hi,
> >
> > In my environment doing the "proxy" thing didn't work.
> > With an token expire of 168 hours (1 week) the job consistently
> terminates
> > at exactly (within a margin of 10 seconds) 173.5 hours.
> > So far we have not been able to solve this problem.
> >
> > Our teams now simply assume the thing fails once in a while and have an
> > automatic restart feature (i.e. shell script with a while true loop).
> > The best guess at a root cause is this
> > https://issues.apache.org/jira/browse/HDFS-9276
> >
> > If you have a real solution or a reference to a related bug report to
> this
> > problem then please share!
> >
> > Niels Basjes
> >
> >
> >
> > On Thu, Mar 17, 2016 at 10:20 AM, Thomas Lamirault
> > <thomas.lamirault@ericsson.com> wrote:
> >>
> >> Hi Max,
> >>
> >> I will try these workaround.
> >> Thanks
> >>
> >> Thomas
> >>
> >> ________________________________________
> >> De : Maximilian Michels [mxm@apache.org]
> >> Envoyé : mardi 15 mars 2016 16:51
> >> À : user@flink.apache.org
> >> Cc : Niels Basjes
> >> Objet : Re: Flink job on secure Yarn fails after many hours
> >>
> >> Hi Thomas,
> >>
> >> Nils (CC) and I found out that you need at least Hadoop version 2.6.1
> >> to properly run Kerberos applications on Hadoop clusters. Versions
> >> before that have critical bugs related to the internal security token
> >> handling that may expire the token although it is still valid.
> >>
> >> That said, there is another limitation of Hadoop that the maximum
> >> internal token life time is one week. To work around this limit, you
> >> have two options:
> >>
> >> a) increasing the maximum token life time
> >>
> >> In yarn-site.xml:
> >>
> >> <property>
> >>   <name>yarn.resourcemanager.delegation.token.max-lifetime</name>
> >>   <value>9223372036854775807</value>
> >> </property>
> >>
> >> In hdfs-site.xml
> >>
> >> <property>
> >>   <name>dfs.namenode.delegation.token.max-lifetime</name>
> >>   <value>9223372036854775807</value>
> >> </property>
> >>
> >>
> >> b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode:
> >>
> >> From
> >> http://www.cloudera.com/documentation/enterprise/5-3-
> x/topics/cm_sg_yarn_long_jobs.html
> >>
> >> "You can work around this by configuring the ResourceManager as a
> >> proxy user for the corresponding HDFS NameNode so that the
> >> ResourceManager can request new tokens when the existing ones are past
> >> their maximum lifetime."
> >>
> >> @Nils: Could you comment on what worked best for you?
> >>
> >> Best,
> >> Max
> >>
> >>
> >> On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault
> >> <thomas.lamirault@ericsson.com> wrote:
> >> >
> >> > Hello everyone,
> >> >
> >> >
> >> >
> >> > We are facing the same probleme now in our Flink applications, launch
> >> > using YARN.
> >> >
> >> > Just want to know if there is any update about this exception ?
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> > Thomas
> >> >
> >> >
> >> >
> >> > ________________________________
> >> >
> >> > De : niels@basj.es [niels@basj.es] de la part de Niels Basjes
> >> > [Niels@basjes.nl]
> >> > Envoyé : vendredi 4 décembre 2015 10:40
> >> > À : user@flink.apache.org
> >> > Objet : Re: Flink job on secure Yarn fails after many hours
> >> >
> >> > Hi Maximilian,
> >> >
> >> > I just downloaded the version from your google drive and used that to
> >> > run my test topology that accesses HBase.
> >> > I deliberately started it twice to double the chance to run into this
> >> > situation.
> >> >
> >> > I'll keep you posted.
> >> >
> >> > Niels
> >> >
> >> >
> >> > On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <mxm@apache.org>
> >> > wrote:
> >> >>
> >> >> Hi Niels,
> >> >>
> >> >> Just got back from our CI. The build above would fail with a
> >> >> Checkstyle error. I corrected that. Also I have built the binaries
> for
> >> >> your Hadoop version 2.6.0.
> >> >>
> >> >> Binaries:
> >> >>
> >> >>
> >> >> https://github.com/mxm/flink/archive/kerberos-yarn-
> heartbeat-fail-0.10.1.zip
> >> >>
> >> >> Thanks,
> >> >> Max
> >> >>
> >> >> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281
> >> >> >>>> >> >> > 21:30:28,185 ERROR
> >> >> >>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager
> >> >> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912
> terminated,
> >> >> >>>> >> >> > stopping
> >> >> >>>> >> >> > process...
> >> >> >>>> >> >> > 21:30:28,286 INFO
> >> >> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
> >> >> >>>> >> >> > - Removing web root dir
> >> >> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
> >> >> >>>> >> >> >
> >> >> >>>> >> >> >
> >> >> >>>> >> >> > --
> >> >> >>>> >> >> > Best regards / Met vriendelijke
groeten,
> >> >> >>>> >> >> >
> >> >> >>>> >> >> > Niels Basjes
> >> >> >>>> >> >
> >> >> >>>> >> >
> >> >> >>>> >> >
> >> >> >>>> >> >
> >> >> >>>> >> > --
> >> >> >>>> >> > Best regards / Met vriendelijke groeten,
> >> >> >>>> >> >
> >> >> >>>> >> > Niels Basjes
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> > --
> >> >> >>>> > Best regards / Met vriendelijke groeten,
> >> >> >>>> >
> >> >> >>>> > Niels Basjes
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Best regards / Met vriendelijke groeten,
> >> >> >>>
> >> >> >>> Niels Basjes
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Best regards / Met vriendelijke groeten,
> >> >
> >> > Niels Basjes
> >
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
>

Mime
View raw message