flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Kerberos on YARN: delegation or proxying?
Date Sun, 06 Mar 2016 21:02:21 GMT
Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.



On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
<stefano.baghino@radicalbit.io> wrote:
> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> <stefano.baghino@radicalbit.io> wrote:
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>> <stefano.baghino@radicalbit.io> wrote:
>>> Hello everybody,
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>> and then running the following command:
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>> Thank you so much in advance.
>>> --
>>> BR,
>>> Stefano Baghino
>>> Software Engineer @ Radicalbit
>> --
>> BR,
>> Stefano Baghino
>> Software Engineer @ Radicalbit
> --
> BR,
> Stefano Baghino
> Software Engineer @ Radicalbit

View raw message