airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco Peschiera <franco.peschi...@gmail.com>
Subject Re: extension of the REST API
Date Tue, 27 Oct 2020 16:53:45 GMT
Thanks Jarek,

(forgot to answer this last email).

We're going to end up using git-sync and a github repository to deploy our
dags.

I do see a couple of disadvantages with respect to our way of using
airflow. Namely, as we change our versions of dags, we will have to do
changes in the input and output schemas that are tied to those dags. The
problem will arise when we end up with orphan schemas in the database that
have no dag because we will only keep the last version of the dags while we
will keep all versions of schemas. Also, we will not be able to offer
multiple versions of the same dag.

Still, I think this is a good enough solution.

Thanks again.




On Fri, Oct 16, 2020 at 11:03 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> Hello Franco,
>
> My 3 cents.
>
> I have another proposal on how to solve your problems. I think API is not
> the best approach but there are some best-practices around DAG syncing that
> I strongly believe will become the most popular and most "stable"
> deployments. We've used this when we prepared a deployment strategy for a
> few of our customers and we think it's best to build your DAG deployment
> strategy around Git-sync.
>
> Simply speaking - rather than distributing the DAGs via API or shared
> file-system, you commit your DAGs to a Git repository and let git-sync do
> the work of syncing it with Airflow workers/schedulers/webserver. This has
> numerous advantages and no disadvantages IIMHO:
>
> a) DAGs are python code. It feels natural to keep DAGs in a Git repo
> b) You can track history, origin, authoring and easily see what's changed
> in those DAGs.
> c) You can include DAG verification process - both manual review and
> automated CI checks (for example Github Actions) - Git/GitHub provides
> out-of-the-box all the tools and automation you need for that. We've
> successfully implemented both manual review process as well as
> sophisticated automated checks (for example how long does it take to parse
> the DAG) using GitHub Actions
> d) In the "chart" of Airflow (the one from "master" sources - not yet
> officially released), you have full support for the git-sync sidecar to run
> alongside your workers/scheduler/webserver. Git-sync is
> "first-class-citizen" in the Airflow "ecosystem"
>
> J.
>
> On Fri, Oct 16, 2020 at 10:20 AM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
>> Hello James,
>>
>> Thanks for the detailed response. I completely understand.
>> We will check how the plugins work and probably do the modifications
>> ourselves. If we do end up automating the uploading in an airflow fork, we
>> will create a PR in case it's interesting (I guess we just need to see how
>> / where the DAGs are serialized and stored in the database).
>> Alternatively, we may end up adapting our deploy flow to match the
>> current flow for uploading DAGs in airflow.
>>
>> I'll give some context so you have a bit more detail on what we want to
>> do. It can definitively be the case that we're overdoing it, or doing
>> something wrong.
>> We want to match each of our airflow DAGs with an input and output schema
>> (a json format that is read like a marshmallow schema) that are stored
>> outside airflow, in our own application. The three (DAG + input schema +
>> output schema) make a single app. Then, when a user calls the REST API in
>> our app, he/she sends the input data (hopefully matching the input schema)
>> and the DAG to run. We then store the input data as json in our database
>> and start a dagrun at airflow giving it the id of the input data in our
>> database so it can read/ write.
>>
>> All to say, we want to be able to deploy DAG + input schema + output
>> schema as simply and integrated as possible, so we avoid discrepancies or
>> errors. Since the DAGs are stored in airflow and the other two are stored
>> in our application, we would ideally want to do everything from our
>> application and be able to upload the DAG to airflow in an automated way as
>> part of our own deploy flow.
>>
>> Thanks again and take care,
>>
>> Franco Peschiera
>>
>> On Fri, Oct 16, 2020 at 6:17 AM James Timmins <james@astronomer.io>
>> wrote:
>>
>>> Hi Franco,
>>>
>>> I know it may seem strange that the API in 2.0 won't support uploading
>>> or substantially modifying DAGs, (as you mentioned, there is an Update
>>> endpoint, but it is limited to pausing/unpausing DAGs). This is because the
>>> goal for the API, at least for now, is to have feature parity with the
>>> Airflow UI and CLI. Since DAG uploading isn't supported by those tools,
>>> it's out of scope for the 2.0 API. If you're curious about the goals and
>>> decisions behind the API, there's more info in the improvement proposal.
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-32%3A+Airflow+REST+API
>>>
>>> While I haven't used any of the plugins, you could fork Airflow and add
>>> that functionality to the API yourself if you'd like.
>>>
>>> A reasonable next question is whether or not that functionality is
>>> planned for a future release. I haven't heard anything about that on the
>>> project roadmap, but others may have more insight there. In general, the
>>> focus has been entirely on getting 2.0 stable and shipped. There are some
>>> features planned for 2.1, but I don't think they're API related. Beyond
>>> that, we'll have to see what features are needed by the community.
>>>
>>> Hopefully that provides a bit of clarity into the confusing aspects of
>>> the API.
>>>
>>> Kind regards,
>>> James
>>>
>>>
>>> On Thu, Oct 15, 2020 at 2:33 PM Franco Peschiera <
>>> franco.peschiera@gmail.com> wrote:
>>>
>>>> wow, that's great! Thanks for the quick and positive response.
>>>>
>>>> One thing though. I did not find a way to write (POST) a DAG (i.e.,
>>>> upload a new DAG). Maybe for security reasons? (Although I see an "Update
a
>>>> DAG" endpoint).
>>>>
>>>> Thanks again.
>>>>
>>>> On Thu, Oct 15, 2020 at 11:19 PM Kaxil Naik <kaxilnaik@gmail.com>
>>>> wrote:
>>>>
>>>>> Airflow 2.0 will have a full-featured API:
>>>>> https://github.com/apache/airflow/blob/master/UPDATING.md#migration-guide-from-experimental-api-to-stable-api-v1
>>>>>
>>>>> API Spec & Details:
>>>>> https://airflow.readthedocs.io/en/latest/stable-rest-api-ref.html
>>>>>
>>>>>
>>>>> On Thu, Oct 15, 2020 at 10:09 PM Franco Peschiera <
>>>>> franco.peschiera@gmail.com> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> We're currently building a web app that makes use of airflow to
>>>>>> delegate tasks. First of all, thanks for this excellent tool: it
seems it
>>>>>> will save us a lot of time and headaches.
>>>>>>
>>>>>> I've been checking the REST api since we want to ideally communicate
>>>>>> exclusively this way. And yes, I know that functionality appears
to be new
>>>>>> / recent (because of the "experimental" tag in the docs and the URL).
>>>>>> Having said that, there are some things that the REST API doesn't
do (yet?)
>>>>>> that we would love to have: (1) upload a new DAG, (2) check the status
of a
>>>>>> dagrun, among others.
>>>>>>
>>>>>> The rest api docs I'm reading:
>>>>>> https://airflow.apache.org/docs/stable/rest-api-ref.html
>>>>>>
>>>>>> On the other hand, I've found there are side projects / third party
>>>>>> plugins that do offer this functionality:
>>>>>> https://github.com/teamclairvoyant/airflow-rest-api-plugin
>>>>>>
>>>>>> So I have the following questions: (1) are there any plans on
>>>>>> completing the official REST API to meet the CLI / python ones? (2)
is it a
>>>>>> good idea to try third party plugins for this? if so, do you recommend
a
>>>>>> specific one?
>>>>>>
>>>>>> Thanks again!
>>>>>>
>>>>>> Franco
>>>>>>
>>>>>>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Mime
View raw message