flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Baghino <stefano.bagh...@radicalbit.io>
Subject Re: Kerberos on YARN: delegation or proxying?
Date Sun, 06 Mar 2016 21:26:03 GMT
Ok, thank you for the very detailed explanation!

On Sun, Mar 6, 2016 at 10:02 PM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Stefano,
>
> That is currently a limitation of the Kerberos implementation. The
> Kerberos authentication is performed only once the Flink cluster is
> brought up. The Yarn session is then tight to a particular user's
> ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
> run long-running jobs because there is a bug in the Kerberos client
> that may let the ticket expire.
>
> The workaround you already mentioned is to use a per-job Yarn cluster.
> There is currently no plan to delegate the user token per job but we
> could certainly think about implementing this in the future.
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos
>
> Cheers,
> Max
>
> On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
> <stefano.baghino@radicalbit.io> wrote:
> > One last note: initially I tried to run the session as the same OS user,
> > running kdestroy and then kinit with the other user, having this error.
> > Trying to run the job in a different OS session, authenticating with
> > Kerberos as the user who should run the job, I can't connect to the
> > JobManager. I've added a second log with this error to the gist.
> >
> > On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> > <stefano.baghino@radicalbit.io> wrote:
> >>
> >> In the initial description, I meant "I'm trying to access a private
> folder
> >> of the latter", so not the service account. Sorry for the mistake.
> >>
> >> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
> >> <stefano.baghino@radicalbit.io> wrote:
> >>>
> >>> Hello everybody,
> >>>
> >>> I'm running some tests on how Flink as a long-running YARN session
> >>> handles security with Kerberos. In particular, I'm running a test
> where I
> >>> run Flink on YARN with a service account and then deploy a job via CLI
> as
> >>> another user; in the job I'm trying to access a private folder of the
> former
> >>> on HDFS but the job fails due to permission issues (the user running
> the job
> >>> is actually the one who ran Flink on YARN in the first place — the
> service
> >>> account).
> >>>
> >>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
> >>>
> >>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
> >>>
> >>> and then running the following command:
> >>>
> >>> bin/flink run examples/batch/WordCount.jar \
> >>> --input hdfs:///user/stefano.baghino/hamlet.txt \
> >>> --output hdfs:///user/stefano.baghino/hamlet.out
> >>>
> >>> Here are the logs:
> >>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
> >>>
> >>> It looks like the YARN session is acting as a proxy for the user
> instead
> >>> of receiving a delegation. Is there a way to change this behavior? Is
> this
> >>> by design? Is there an interest in implementing the delegation (if
> it's not
> >>> already implemented)? Otherwise, is there a workaround, apart from
> running
> >>> one-off jobs on YARN?
> >>>
> >>> Thank you so much in advance.
> >>>
> >>> --
> >>> BR,
> >>> Stefano Baghino
> >>>
> >>> Software Engineer @ Radicalbit
> >>
> >>
> >>
> >>
> >> --
> >> BR,
> >> Stefano Baghino
> >>
> >> Software Engineer @ Radicalbit
> >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Mime
View raw message