flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuyi Chen <suez1...@gmail.com>
Subject Re: Flink-Yarn-Kerberos integration
Date Wed, 03 Jan 2018 23:43:19 GMT
Thanks a lot for the clarification, Eron. That's very helpful. Currently,
we are more concerned about 1) data access, but will get to 2) and 3)
eventually.

I was thinking doing the following:
1) extend the current HadoopModule to use and refresh DTs as suggested on YARN
Application Security docs.
2) I found the current SecurityModule interface might be enough for
supporting other security mechanisms. However, the loading of security
modules are hard-coded, not configuration based. I think we can extend
SecurityUtils to load from configurations. So we can implement our own
security mechanism in our internal repo, and have flink jobs to load it at
runtime.

Please let me know your comments. Thanks a lot.

On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <eronwright@gmail.com> wrote:

> I agree that it is reasonable to use Hadoop DTs as you describe.  That
> approach is even recommended in YARN's documentation (see Securing
> Long-lived YARN Services on the YARN Application Security page).   But one
> of the goals of Kerberos integration is to support Kerberized data access
> for connectors other than HDFS, such as Kafka, Cassandra, and
> Elasticsearch.   So your second point makes sense too, suggesting a general
> architecture for managing secrets (DTs, keytabs, certificates, oauth
> tokens, etc.) within the cluster.
>
> There's quite a few aspects to Flink security, including:
> 1. data access (e.g. how a connector authenticates to a data source)
> 2. service authorization and network security (e.g. how a Flink cluster
> protects itself from unauthorized access)
> 3. multi-user support (e.g. multi-user Flink clusters, RBAC)
>
> I mention these aspects to clarify your point about AuthN, which I took to
> be related to (1).   Do tell if I misunderstood.
>
> Eron
>
>
> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <suez1224@gmail.com> wrote:
>
> > Hi community,
> >
> > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos
> > integration will require each container of a job to log in Kerberos via
> > keytab every say, 24 hours, and does not use any Hadoop delegation token
> > mechanism except when localizing the container. As I fixed the current
> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more
> > features(FLINK-7860), I have some concern regarding the current
> > implementation. It can pose a scalability issue to the KDC, e.g., if YARN
> > cluster is restarted and all 10s of thousands of containers suddenly DDOS
> > KDC.
> >
> > I would like to propose to improve the current Flink-YARN-Kerberos
> > integration by doing something like the following:
> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all
> > required DTs for the job.
> > 2) all other TM or TE containers periodically retrieve new DTs from the
> > AppMaster (either through a secure HDFS folder, or a secure Akka channel)
> >
> > Also, we want to extend Flink to support pluggable AuthN mechanism,
> because
> > we have our own internal AuthN mechanism. We would like add support in
> > Flink to authenticate periodically to our internal AuthN service as well
> > through, e.g., dynamic class loading, and use similar mechanism to
> > distribute the credential from the appMaster to containers.
> >
> > I would like to get comments and feedbacks. I can also write a design doc
> > or create a Flip if needed. Thanks a lot.
> >
> > Shuyi
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message