airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Kerberos and Airflow
Date Thu, 02 Aug 2018 17:51:14 GMT
Hi Dan,

I discussed this a little bit with one of the security architects here. We think that 
you can have a fair trade off between security and usability by having
a kind of manifest with the dag you are submitting. This manifest can then 
specify what the generated tasks/dags are allowed to do and what metadata 
to provide to them. We could also let the scheduler generate hashes per generated
DAG / task and verify those with an established version (1st run?). This limits the 
attack vector.

A DagSerializer would be great, but I think it solves a different issue and the above 
is somewhat simpler to implement?

Bolke

> On 29 Jul 2018, at 23:47, Dan Davydov <ddavydov@twitter.com.INVALID> wrote:
> 
> *Let’s say we trust the owner field of the DAGs I think we could do the
> following.*
> *Obviously, the trusting the user part is key here. It is one of the
> reasons I was suggesting using “airflow submit” to update / add dags in
> Airflow*
> 
> 
> *This is the hard part about my question.*
> I think in a true multi-tenant environment we wouldn't be able to trust the
> user, otherwise we wouldn't necessarily even need a mapping of Airflow DAG
> users to secrets, because if we trust users to set the correct Airflow user
> for DAGs, we are basically trusting them with all of the creds the Airflow
> scheduler can access for all users anyways.
> 
> I actually had the same thought as your "airflow submit" a while ago, which
> I discussed with Alex, basically creating an API for adding DAGs instead of
> having the Scheduler parse them. FWIW I think it's superior to the git time
> machine approach because it's a more generic form of "serialization" and is
> more correct as well because the same DAG file parsed on a given git SHA
> can produce different DAGs. Let me know what you think, and maybe I can
> start a more formal design doc if you are onboard:
> 
> A user or service with an auth token sends an "airflow submit" request to a
> new kind of Dag Serialization service, along with the serialized DAG
> objects generated by parsing on the client. It's important that these
> serialized objects are declaritive and not e.g. pickles so that the
> scheduler/workers can consume them and reproducability of the DAGs is
> guaranteed. The service will then store each generated DAG along with it's
> access based on the provided token e.g. using Ranger, and the
> scheduler/workers will use the stored DAGs for scheduling/execution.
> Operators would be deployed along with the Airflow code separately from the
> serialized DAGs.
> 
> A serialed DAG would look something like this (basically Luigi-style :)):
> MyTask - BashOperator: {
>  cmd: "sleep 1"
>  user: "Foo"
>  access: "token1", "token2"
> }
> 
> MyDAG: {
>  MyTask1 >> SomeOtherTask1
>  MyTask2 >> SomeOtherTask1
> }
> 
> Dynamic DAGs in this case would just consist of a service calling "Airflow
> Submit" that does it's own form of authentication to get access to some
> kind of tokens (or basically just forwarding the secrets the users of the
> dynamic DAG submit).
> 
> For the default Airflow implementation you can maybe just have the Dag
> Serialization server bundled with the Scheduler, with auth turned off, and
> to periodically update the Dag Serialization store which would emulate the
> current behavior closely.
> 
> Pros:
> 1. Consistency across running task instances in a dagrun/scheduler,
> reproducability and auditability of DAGs
> 2. Users can control when to deploy their DAGs
> 3. Scheduler runs much faster since it doesn't have to run python files and
> e.g. make network calls
> 4. Scaling scheduler becomes easier because can have different service
> responsible for parsing DAGs which can be trivially scaled horizontally
> (clients are doing the parsing)
> 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs
> easier? e.g. can use the Scheduler itself to schedule backfills with a
> slightly modified serialized version of a DAG.
> 
> Cons:
> 1. Have to deprecate a lot of popular features, e.g. allowing custom
> callbacks in operators (e.g. on_failure), and jinja_templates
> 2. Version compatibility problems, e.g. user/service client might be
> serializing arguments for hooks/operators that have been deprecated in
> newer versions of the hooks, or the serialized DAG schema changes and old
> DAGs aren't automatically updated. Might want to have some kind of
> versioning system for serialized DAGs to at least ensure that stored DAGs
> are valid when the Scheduler/Worker/etc are upgraded, maybe something
> similar to thrift/protobuf versioning.
> 3. Additional complexity - additional service, logic on workers/scheduler
> to fetch/cache serialized DAGs efficiently, expiring/archiving old DAG
> definitions, etc
> 
> 
> On Sun, Jul 29, 2018 at 3:20 PM Bolke de Bruin <bdbruin@gmail.com <mailto:bdbruin@gmail.com>>
wrote:
> 
>> Ah gotcha. That’s another issue actually (but related).
>> 
>> Let’s say we trust the owner field of the DAGs I think we could do the
>> following. We then have a table (and interface) to tell Airflow what users
>> have access to what connections. The scheduler can then check if the task
>> in the dag can access the conn_id it is asking for. Auto generated dags
>> still have an owner (or should) and therefore should be fine. Some
>> integrity checking could/should be added as we want to be sure that the
>> task we schedule is the task we launch. So a signature calculated at the
>> scheduler (or part of the DAG), send as part of the metadata and checked by
>> the executor is probably smart.
>> 
>> You can also make this more fancy by integrating with something like
>> Apache Ranger that allows for policy checking.
>> 
>> Obviously, the trusting the user part is key here. It is one of the
>> reasons I was suggesting using “airflow submit” to update / add dags in
>> Airflow. We could enforce authentication on the DAG. It was kind of ruled
>> out in favor of git time machines although these never happened afaik ;-).
>> 
>> BTW: I have updated my implementation with protobuf. Metadata is now
>> available at executor and task.
>> 
>> 
>>> On 29 Jul 2018, at 15:47, Dan Davydov <ddavydov@twitter.com.INVALID>
>> wrote:
>>> 
>>> The concern is how to secure secrets on the scheduler such that only
>>> certain DAGs can access them, and in the case of files that create DAGs
>>> dynamically, only some set of DAGs should be able to access these
>> secrets.
>>> 
>>> e.g. if there is a secret/keytab that can be read by DAG A generated by
>>> file X, and file X generates DAG B as well, there needs to be a scheme to
>>> stop the parsing of DAG B on the scheduler from being able to read the
>>> secret in DAG A.
>>> 
>>> Does that make sense?
>>> 
>>> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin <bdbruin@gmail.com
>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>> wrote:
>>> 
>>>> I’m not sure what you mean. The example I created allows for dynamic
>> DAGs,
>>>> as the scheduler obviously knows about the tasks when they are ready to
>> be
>>>> scheduled.
>>>> This isn’t any different from a static DAG or a dynamic one.
>>>> 
>>>> For Kerberos it isnt that special. Basically a keytab are the revokable
>>>> users credentials
>>>> in a special format. The keytab itself can be protected by a password.
>> So
>>>> I can imagine
>>>> that a connection is defined that sets a keytab location and password to
>>>> access the keytab.
>>>> The scheduler understands this (or maybe the Connection model) and
>>>> serializes and sends
>>>> it to the worker as part of the metadata. The worker then reconstructs
>> the
>>>> keytab and issues
>>>> a kinit or supplies it to the other service requiring it (eg. Spark)
>>>> 
>>>> * Obviously the worker and scheduler need to communicate over SSL.
>>>> * There is a challenge at the worker level. Credentials are secured
>>>> against other users, but are readable by the owning user. So imagine 2
>> DAGs
>>>> from two different users with different connections without sudo
>>>> configured. If they end up at the same worker if DAG 2 is malicious it
>>>> could read files and memory created by DAG 1. This is the reason why
>> using
>>>> environment variables are NOT safe (DAG 2 could read
>> /proc/<pid>/environ).
>>>> To mitigate this we probably need to PIPE the data to the task’s STDIN.
>> It
>>>> won’t solve the issue but will make it harder as now it will only be in
>>>> memory.
>>>> * The reconstructed keytab (or the initalized version) can be stored in,
>>>> most likely, the process-keyring (
>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html <http://man7.org/linux/man-pages/man7/process-keyring.7.html>
<
>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html <http://man7.org/linux/man-pages/man7/process-keyring.7.html>
<
>> http://man7.org/linux/man-pages/man7/process-keyring.7.html <http://man7.org/linux/man-pages/man7/process-keyring.7.html>>>).
As
>>>> mentioned earlier this poses a challenge for Java applications that
>> cannot
>>>> read from this location (keytab an ccache). Writing it out to the
>>>> filesystem then becomes a possibility. This is essentially the same how
>>>> Spark solves it (
>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode <https://spark.apache.org/docs/latest/security.html#yarn-mode>
<
>> https://spark.apache.org/docs/latest/security.html#yarn-mode <https://spark.apache.org/docs/latest/security.html#yarn-mode>>
<
>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode <https://spark.apache.org/docs/latest/security.html#yarn-mode>
<
>> https://spark.apache.org/docs/latest/security.html#yarn-mode <https://spark.apache.org/docs/latest/security.html#yarn-mode>>>).
>>>> 
>>>> Why not work on this together? We need it as well. Airflow as it is now
>> we
>>>> consider the biggest security threat and it is really hard to secure it.
>>>> The above would definitely be a serious improvement. Another step would
>> be
>>>> to stop Tasks from accessing the Airflow DB all together.
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>>> On 29 Jul 2018, at 05:36, Dan Davydov <ddavydov@twitter.com.INVALID
<mailto:ddavydov@twitter.com.INVALID>
>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>>>
>>>> wrote:
>>>>> 
>>>>> This makes sense, and thanks for putting this together. I might pick
>> this
>>>>> up myself depending on if we can get the rest of the mutli-tenancy
>> story
>>>>> nailed down, but I still think the tricky part is figuring out how to
>>>> allow
>>>>> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work
>> with
>>>>> Kerberos, curious what your thoughts are there. How would secrets be
>>>> passed
>>>>> securely in a multi-tenant Scheduler starting from parsing the DAGs up
>> to
>>>>> the executor sending them off?
>>>>> 
>>>>> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com>
>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>
>>>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com
<mailto:bdbruin@gmail.com>>>> wrote:
>>>>> 
>>>>>> Here:
>>>>>> 
>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>
<
>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>>
<
>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>
<
>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>>>
<
>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>
<
>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>>
<
>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>
<
>> https://github.com/bolkedebruin/airflow/tree/secure_connections <https://github.com/bolkedebruin/airflow/tree/secure_connections>>>>
>>>>>> 
>>>>>> Is a working rudimentary implementation that allows securing the
>>>>>> connections (only LocalExecutor at the moment)
>>>>>> 
>>>>>> * It enforces the use of “conn_id” instead of the mix that we
have now
>>>>>> * A task if using “conn_id” has ‘auto-registered’ (which
is a noop)
>> its
>>>>>> connections
>>>>>> * The scheduler reads the connection informations and serializes
it to
>>>>>> json (which should be a different format, protobuf preferably)
>>>>>> * The scheduler then sends this info to the executor
>>>>>> * The executor puts this in the environment of the task (environment
>>>> most
>>>>>> likely not secure enough for us)
>>>>>> * The BaseHook reads out this environment variable and does not need
>> to
>>>>>> touch the database
>>>>>> 
>>>>>> The example_http_operator works, I havent tested any other. To make
it
>>>>>> work I just adjusted the hook and operator to use “conn_id” instead
>>>>>> of the non standard http_conn_id.
>>>>>> 
>>>>>> Makes sense?
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> * The BaseHook is adjusted to not connect to the database
>>>>>>> On 28 Jul 2018, at 17:50, Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com> <mailto:
>> bdbruin@gmail.com <mailto:bdbruin@gmail.com>>> wrote:
>>>>>>> 
>>>>>>> Well, I don’t think a hook (or task) should be obtain it by
itself.
>> It
>>>>>> should be supplied.
>>>>>>> At the moment you start executing the task you cannot trust it
>> anymore
>>>>>> (ie. it is unmanaged
>>>>>>> / non airflow code).
>>>>>>> 
>>>>>>> So we could change the basehook to understand supplied credentials
>> and
>>>>>> populate
>>>>>>> a hash with “conn_ids”. Hooks normally call BaseHook.get_connection
>>>>>> anyway, so
>>>>>>> it shouldnt be too hard and should in principle not require changes
>> to
>>>>>> the hooks
>>>>>>> themselves if they are well behaved.
>>>>>>> 
>>>>>>> B.
>>>>>>> 
>>>>>>>> On 28 Jul 2018, at 17:41, Dan Davydov <ddavydov@twitter.com.INVALID
<mailto:ddavydov@twitter.com.INVALID>
>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>>
>>>>>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
<mailto:
>> ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>> <mailto:
>>>> ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
<mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>>>>>
>> wrote:
>>>>>>>> 
>>>>>>>> *So basically in the scheduler we parse the dag. Either from
the
>>>>>> manifest
>>>>>>>> (new) or from smart parsing (probably harder, maybe some
auto
>>>>>> register?) we
>>>>>>>> know what connections and keytabs are available dag wide
or per
>> task.*
>>>>>>>> This is the hard part that I was curious about, for dynamically
>>>> created
>>>>>>>> DAGs, e.g. those generated by reading tasks in a MySQL database
or a
>>>>>> json
>>>>>>>> file, there isn't a great way to do this.
>>>>>>>> 
>>>>>>>> I 100% agree with deprecating the connections table (at least
for
>> the
>>>>>>>> secure option). The main work there is rewriting all hooks
to take
>>>>>>>> credentials from arbitrary data sources by allowing a customized
>>>>>>>> CredentialsReader class. Although hooks are technically private,
I
>>>>>> think a
>>>>>>>> lot of companies depend on them so the PMC should probably
discuss
>> if
>>>>>> this
>>>>>>>> is an Airflow 2.0 change or not.
>>>>>>>> 
>>>>>>>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com>
>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>
>>>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com
<mailto:bdbruin@gmail.com>>>
>>>>>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com
<mailto:bdbruin@gmail.com>> <mailto:
>> bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>>>>
wrote:
>>>>>>>> 
>>>>>>>>> Sure. In general I consider keytabs as a part of connection
>>>>>> information.
>>>>>>>>> Connections should be secured by sending the connection
>> information a
>>>>>> task
>>>>>>>>> needs as part of information the executor gets. A task
should then
>>>> not
>>>>>> need
>>>>>>>>> access to the connection table in Airflow. Keytabs could
then be
>> send
>>>>>> as
>>>>>>>>> part of the connection information (base64 encoded) and
setup by
>> the
>>>>>>>>> executor (this key) to be read only to the task it is
launching.
>>>>>>>>> 
>>>>>>>>> So basically in the scheduler we parse the dag. Either
from the
>>>>>> manifest
>>>>>>>>> (new) or from smart parsing (probably harder, maybe some
auto
>>>>>> register?) we
>>>>>>>>> know what connections and keytabs are available dag wide
or per
>> task.
>>>>>>>>> 
>>>>>>>>> The credentials and connection information then are serialized
>> into a
>>>>>>>>> protobuf message and send to the executor as part of
the “queue”
>>>>>> action.
>>>>>>>>> The worker then deserializes the information and makes
it securely
>>>>>>>>> available to the task (which is quite hard btw).
>>>>>>>>> 
>>>>>>>>> On that last bit making the info securely available might
be
>> storing
>>>>>> it in
>>>>>>>>> the Linux KEYRING (supported by python keyring). Keytabs
will be
>>>> tough
>>>>>> to
>>>>>>>>> do properly due to Java not properly supporting KEYRING
and only
>>>> files
>>>>>> and
>>>>>>>>> these are hard to make secure (due to the possibility
a process
>> will
>>>>>> list
>>>>>>>>> all files in /tmp and get credentials through that).
Maybe storing
>>>> the
>>>>>>>>> keytab with a password and having the password in the
KEYRING might
>>>>>> work.
>>>>>>>>> Something to find out.
>>>>>>>>> 
>>>>>>>>> B.
>>>>>>>>> 
>>>>>>>>> Verstuurd vanaf mijn iPad
>>>>>>>>> 
>>>>>>>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov
>>>>>> <ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
<mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>>
>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
<mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
>>>> 
>>>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>
<mailto:
>> ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>> <mailto:ddavydov@twitter.com.INVALID
<mailto:ddavydov@twitter.com.INVALID>
>> <mailto:ddavydov@twitter.com.INVALID <mailto:ddavydov@twitter.com.INVALID>>
>>>>>>> 
>>>>>>>>> het volgende geschreven:
>>>>>>>>>> 
>>>>>>>>>> I'm curious if you had any ideas in terms of ideas
to enable
>>>>>>>>> multi-tenancy
>>>>>>>>>> with respect to Kerberos in Airflow.
>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin
<
>> bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>
>>>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com
<mailto:bdbruin@gmail.com>>>
>>>>>> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com
<mailto:bdbruin@gmail.com>> <mailto:
>> bdbruin@gmail.com <mailto:bdbruin@gmail.com> <mailto:bdbruin@gmail.com <mailto:bdbruin@gmail.com>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Cool. The doc will need some refinement as it
isn't entirely
>>>>>> accurate.
>>>>>>>>> In
>>>>>>>>>>> addition we need to separate between Airflow
as a client of
>>>>>> kerberized
>>>>>>>>>>> services (this is what is talked about in the
astronomer doc) vs
>>>>>>>>>>> kerberizing airflow itself, which the API supports.
>>>>>>>>>>> 
>>>>>>>>>>> In general to access kerberized services (airflow
as a client)
>> one
>>>>>> needs
>>>>>>>>>>> to start the ticket renewer with a valid keytab.
For the hooks it
>>>>>> isn't
>>>>>>>>>>> always required to change the hook to support
it. Hadoop cli
>> tools
>>>>>> often
>>>>>>>>>>> just pick it up as their client config is set
to do so. Then
>>>> another
>>>>>>>>> class
>>>>>>>>>>> is there for HTTP-like services which are accessed
by urllib
>> under
>>>>>> the
>>>>>>>>>>> hood, these typically use SPNEGO. These often
need to be adjusted
>>>> as
>>>>>> it
>>>>>>>>>>> requires some urllib config. Finally, there are
protocols which
>> use
>>>>>> SASL
>>>>>>>>>>> with kerberos. Like HDFS (not webhdfs, that uses
SPNEGO). These
>>>>>> require
>>>>>>>>> per
>>>>>>>>>>> protocol implementations.
>>>>>>>>>>> 
>>>>>>>>>>> From the top of my head we support kerberos client
side now with:
>>>>>>>>>>> 
>>>>>>>>>>> * Spark
>>>>>>>>>>> * HDFS (snakebite python 2.7, cli and with the
upcoming libhdfs
>>>>>>>>>>> implementation)
>>>>>>>>>>> * Hive (not metastore afaik)
>>>>>>>>>>> 
>>>>>>>>>>> Two things to remember:
>>>>>>>>>>> 
>>>>>>>>>>> * If a job (ie. Spark job) will finish later
than the maximum
>>>> ticket
>>>>>>>>>>> lifetime you probably need to provide a keytab
to said
>> application.
>>>>>>>>>>> Otherwise you will get failures after the expiry.
>>>>>>>>>>> * A keytab (used by the renewer) are credentials
(user and pass)
>> so
>>>>>> jobs
>>>>>>>>>>> are executed under the keytab in use at that
moment
>>>>>>>>>>> * Securing keytab in multi tenancy airflow is
a challenge. This
>>>> also
>>>>>>>>> goes
>>>>>>>>>>> for securing connections. This we need to fix
at some point.
>>>> Solution
>>>>>>>>> for
>>>>>>>>>>> now seems to be no multi tenancy.
>>>>>>>>>>> 
>>>>>>>>>>> Kerberos seems harder than it is btw. Still,
we are sometimes
>>>> moving
>>>>>>>>> away
>>>>>>>>>>> from it to OAUTH2 based authentication. This
gets use closer to
>>>> cloud
>>>>>>>>>>> standards (but we are on prem)
>>>>>>>>>>> 
>>>>>>>>>>> B.
>>>>>>>>>>> 
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>> 
>>>>>>>>>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hitesh@apache.org
<mailto:hitesh@apache.org>
>> <mailto:hitesh@apache.org <mailto:hitesh@apache.org>> <mailto:
>>>> hitesh@apache.org <mailto:hitesh@apache.org> <mailto:hitesh@apache.org
<mailto:hitesh@apache.org>>> <mailto:
>>>>>> hitesh@apache.org <mailto:hitesh@apache.org> <mailto:hitesh@apache.org
<mailto:hitesh@apache.org>> <mailto:
>> hitesh@apache.org <mailto:hitesh@apache.org> <mailto:hitesh@apache.org <mailto:hitesh@apache.org>>>>>
wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Taylor
>>>>>>>>>>>> 
>>>>>>>>>>>> +1 on upstreaming this. It would be great
if you can submit a
>> pull
>>>>>>>>>>> request
>>>>>>>>>>>> to enhance the apache airflow docs.
>>>>>>>>>>>> 
>>>>>>>>>>>> thanks
>>>>>>>>>>>> Hitesh
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor
Edmiston <
>>>>>> tedmiston@gmail.com <mailto:tedmiston@gmail.com> <mailto:tedmiston@gmail.com
<mailto:tedmiston@gmail.com>> <mailto:
>> tedmiston@gmail.com <mailto:tedmiston@gmail.com> <mailto:tedmiston@gmail.com
<mailto:tedmiston@gmail.com>>> <mailto:
>>>> tedmiston@gmail.com <mailto:tedmiston@gmail.com> <mailto:tedmiston@gmail.com
<mailto:tedmiston@gmail.com>> <mailto:
>> tedmiston@gmail.com <mailto:tedmiston@gmail.com> <mailto:tedmiston@gmail.com
<mailto:tedmiston@gmail.com>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> While we're on the topic, I'd love any
feedback from Bolke or
>>>>>> others
>>>>>>>>>>> who've
>>>>>>>>>>>>> used Kerberos with Airflow on this quick
guide I put together
>>>>>>>>> yesterday.
>>>>>>>>>>>>> It's similar to what's in the Airflow
docs but instead all on
>> one
>>>>>> page
>>>>>>>>>>>>> and slightly
>>>>>>>>>>>>> expanded.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>> <
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>>> 
>>>> <
>>>> 
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>> <
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>>> 
>>>>> 
>>>>>> <
>>>>>> 
>>>> 
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>> <
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>>> 
>>>> <
>>>> 
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>> <
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md>
>>> 
>>>>> 
>>>>>>> 
>>>>>>>>>>>>> (or web version <https://www.astronomer.io/guides/kerberos/
<https://www.astronomer.io/guides/kerberos/> <
>> https://www.astronomer.io/guides/kerberos/ <https://www.astronomer.io/guides/kerberos/>>
<
>>>> https://www.astronomer.io/guides/kerberos/ <https://www.astronomer.io/guides/kerberos/>
<
>> https://www.astronomer.io/guides/kerberos/ <https://www.astronomer.io/guides/kerberos/>>>>)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One thing I'd like to add is a minimal
example of how to
>>>> Kerberize
>>>>>> a
>>>>>>>>>>> hook.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'd be happy to upstream this as well
if it's useful (maybe a
>>>>>>>>> Concepts >
>>>>>>>>>>>>> Additional Functionality > Kerberos
page?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Taylor
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Taylor Edmiston*
>>>>>>>>>>>>> Blog <https://blog.tedmiston.com/
<https://blog.tedmiston.com/> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>
>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> <https://blog.tedmiston.com/
<https://blog.tedmiston.com/>>>>
>>>> | CV
>>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor
<https://stackoverflow.com/cv/taylor> <
>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>>
<
>>>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>
<
>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>>>>
| LinkedIn
>>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/
<https://www.linkedin.com/in/tedmiston/> <
>> https://www.linkedin.com/in/tedmiston/ <https://www.linkedin.com/in/tedmiston/>>
<
>>>> https://www.linkedin.com/in/tedmiston/ <https://www.linkedin.com/in/tedmiston/>
<
>> https://www.linkedin.com/in/tedmiston/ <https://www.linkedin.com/in/tedmiston/>>>>
| AngelList
>>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor>
<https://angel.co/taylor <https://angel.co/taylor>> <
>> https://angel.co/taylor <https://angel.co/taylor> <https://angel.co/taylor
<https://angel.co/taylor>>>> | Stack
>>>> Overflow
>>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston
<https://stackoverflow.com/users/149428/taylor-edmiston> <
>> https://stackoverflow.com/users/149428/taylor-edmiston <https://stackoverflow.com/users/149428/taylor-edmiston>>
<
>>>> https://stackoverflow.com/users/149428/taylor-edmiston <https://stackoverflow.com/users/149428/taylor-edmiston>
<
>> https://stackoverflow.com/users/149428/taylor-edmiston <https://stackoverflow.com/users/149428/taylor-edmiston>>>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong,
Fokko
>>>>>>>>> <fokko@driesprong.frl <mailto:fokko@driesprong.frl>
<mailto:fokko@driesprong.frl <mailto:fokko@driesprong.frl>> <mailto:
>> fokko@driesprong.frl <mailto:fokko@driesprong.frl> <mailto:fokko@driesprong.frl
<mailto:fokko@driesprong.frl>>>
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Ry,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You should ask Bolke de Bruin. He's
really experienced with
>>>>>> Kerberos
>>>>>>>>>>> and
>>>>>>>>>>>>> he
>>>>>>>>>>>>>> did also the implementation for Airflow.
Beside that he worked
>>>>>> also
>>>>>>>>> on
>>>>>>>>>>>>>> implementing Kerberos in Ambari.
Just want to let you know.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Op do 26 jul. 2018 om 23:03 schreef
Ry Walker <
>> ry@astronomer.io <mailto:ry@astronomer.io> <mailto:ry@astronomer.io <mailto:ry@astronomer.io>>
>>>> <mailto:ry@astronomer.io <mailto:ry@astronomer.io> <mailto:ry@astronomer.io
<mailto:ry@astronomer.io>>>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi everyone -
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We have several bigCo's who are
considering using Airflow
>>>> asking
>>>>>>>>> into
>>>>>>>>>>>>> its
>>>>>>>>>>>>>>> support for Kerberos.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We're going to work on a proof-of-concept
next week, will
>>>> likely
>>>>>>>>>>>>> record a
>>>>>>>>>>>>>>> screencast on it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> For now, we're looking for any
anecdotal information from
>>>>>>>>>>> organizations
>>>>>>>>>>>>>> who
>>>>>>>>>>>>>>> are using Kerberos with Airflow,
if anyone would be willing
>> to
>>>>>> share
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> experiences here, or reply to
me personally, it would be
>>>> greatly
>>>>>>>>>>>>>>> appreciated!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Ry
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> *Ry Walker* | CEO, Astronomer
<http://www.astronomer.io/ <http://www.astronomer.io/> <
>> http://www.astronomer.io/ <http://www.astronomer.io/>> <
>>>> http://www.astronomer.io/ <http://www.astronomer.io/> <http://www.astronomer.io/
<http://www.astronomer.io/>>>> |
>>>>>>>>>>>>>> 513.417.2163 |
>>>>>>>>>>>>>>> @rywalker <http://twitter.com/rywalker
<http://twitter.com/rywalker> <
>> http://twitter.com/rywalker <http://twitter.com/rywalker>> <
>>>> http://twitter.com/rywalker <http://twitter.com/rywalker> <http://twitter.com/rywalker
<http://twitter.com/rywalker>>>> | LinkedIn
>>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker
<http://www.linkedin.com/in/rywalker> <
>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>>
<
>>>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>
<
>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message