flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Baghino <stefano.bagh...@radicalbit.io>
Subject Re: Error when accessing secure HDFS with standalone Flink
Date Wed, 16 Mar 2016 13:28:44 GMT
Hi Max,

thanks for clarifying the job ownership question.

Regarding the security configuration, we set the HADOOP_CONF_DIR
environment variable.
Right now we're testing YARN again, if we go back to standalone and can
come up with some better information regarding the failure I'll write again.

Thank you for taking the time to help me!

On Wed, Mar 16, 2016 at 2:17 PM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Stefano,
>
> The preparations for Kerberos which you described look correct.
>
> Taking a closer lock at the Exception, it seems like the Hadoop config
> or environment variables are not correctly set. It keeps trying to
> authenticate SIMPLE but on the remote side only Kerberos is available.
> Have you added the Hadoop config dir to the Flink config or,
> alternatively, set the HADOOP_CONF_DIR environment variable on the
> nodes?
>
> Just like in Yarn, in standalone mode every job is run under the same
> user which started the cluster.
>
> Cheers,
> Max
>
> On Wed, Mar 16, 2016 at 10:50 AM, Stefano Baghino
> <stefano.baghino@radicalbit.io> wrote:
> > Hi Max,
> >
> > thanks for the tips. What we did has been running kinit on each node with
> > the same user that then went on running the start-cluster.sh script.
> Right
> > now the LDAP groups are backed by the OS ones and the user that ran the
> > launch script is part of the flink group, that is on every node of the
> > cluster and has full access to the flink directory (which is placed under
> > the same path on every node).
> >
> > Would have this been enough to kerberize Flink?
> >
> > Also: once a user runs Flink in secure mode, is every deployed job run as
> > the user that ran the start-cluster.sh script (same behavior as running a
> > YARN session)? Or users can kinit on each node and then submit jobs that
> > will be individually run with their credentials?
> >
> > Thanks again.
> >
> > On Wed, Mar 16, 2016 at 10:30 AM, Maximilian Michels <mxm@apache.org>
> wrote:
> >>
> >> Hi Stefano,
> >>
> >> You have probably seen
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#kerberos
> >> ?
> >>
> >> Currently, all nodes need to be authenticated with the Kerberos before
> >> Flink is started (not just the JobManager). Could it be that the
> >> start-cluster.sh script actually is not authenticated using Kerberos
> >> at the nodes it sshs to when it starts the TaskManagers?
> >>
> >> Best,
> >> Max
> >>
> >>
> >> On Fri, Mar 11, 2016 at 8:17 AM, Stefano Baghino
> >> <stefano.baghino@radicalbit.io> wrote:
> >> > Hello everybody,
> >> >
> >> > me and my colleagues have been running some tests on Flink 1.0.0 in a
> >> > secure
> >> > environment (Kerberos). Yesterday we did several tests on the
> standalone
> >> > Flink deployment but couldn't get it to access HDFS. Judging from the
> >> > error
> >> > it looks like Flink is not trying to authenticate itself with
> Kerberos.
> >> > The
> >> > root cause of the error is
> >> > "org.apache.hadoop.security.AccessControlException: SIMPLE
> >> > authentication is
> >> > not enabled.  Available:[TOKEN, KERBEROS]". I've put the whole logs in
> >> > this
> >> > gist. I've went through the source code and judging from what I saw
> this
> >> > error is emitted by Hadoop if a client is not using any authentication
> >> > method on a secure cluster. Also, in the source code of Flink, it
> looks
> >> > like
> >> > when running a job on a secure cluster a log message (at INFO level)
> >> > should
> >> > be printed stating the fact.
> >> >
> >> > To go through the steps I followed to setup the environment: I've
> built
> >> > Flink and put it in the same folder under the two nodes of the
> cluster,
> >> > adjusted the configs, assigned its ownership (and write permissions)
> to
> >> > a
> >> > group, than I ran kinit with a user belonging to that group on both
> the
> >> > nodes and finally I ran start-cluster.sh and deployed the job. I tried
> >> > both
> >> > running the job as the same user who ran the start-cluster.sh script
> and
> >> > another one (still authenticated with Kerberos on both nodes).
> >> >
> >> > The core-site.xml correctly states that the authentication method is
> >> > kerberos and using the hdfs CLI everything runs as expected. Thinking
> it
> >> > could be an error tied to permissions on the core-site.xml file I also
> >> > added
> >> > the user running the start-cluster.sh script to the hadoop group,
> which
> >> > owned the file, yield the same results, unfortunately.
> >> >
> >> > Can you help me troubleshoot this issue? Thank you so much in advance!
> >> >
> >> > --
> >> > BR,
> >> > Stefano Baghino
> >> >
> >> > Software Engineer @ Radicalbit
> >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Mime
View raw message