airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: extension of the REST API
Date Fri, 16 Oct 2020 09:02:51 GMT
Hello Franco,

My 3 cents.

I have another proposal on how to solve your problems. I think API is not
the best approach but there are some best-practices around DAG syncing that
I strongly believe will become the most popular and most "stable"
deployments. We've used this when we prepared a deployment strategy for a
few of our customers and we think it's best to build your DAG deployment
strategy around Git-sync.

Simply speaking - rather than distributing the DAGs via API or shared
file-system, you commit your DAGs to a Git repository and let git-sync do
the work of syncing it with Airflow workers/schedulers/webserver. This has
numerous advantages and no disadvantages IIMHO:

a) DAGs are python code. It feels natural to keep DAGs in a Git repo
b) You can track history, origin, authoring and easily see what's changed
in those DAGs.
c) You can include DAG verification process - both manual review and
automated CI checks (for example Github Actions) - Git/GitHub provides
out-of-the-box all the tools and automation you need for that. We've
successfully implemented both manual review process as well as
sophisticated automated checks (for example how long does it take to parse
the DAG) using GitHub Actions
d) In the "chart" of Airflow (the one from "master" sources - not yet
officially released), you have full support for the git-sync sidecar to run
alongside your workers/scheduler/webserver. Git-sync is
"first-class-citizen" in the Airflow "ecosystem"

J.

On Fri, Oct 16, 2020 at 10:20 AM Franco Peschiera <
franco.peschiera@gmail.com> wrote:

> Hello James,
>
> Thanks for the detailed response. I completely understand.
> We will check how the plugins work and probably do the modifications
> ourselves. If we do end up automating the uploading in an airflow fork, we
> will create a PR in case it's interesting (I guess we just need to see how
> / where the DAGs are serialized and stored in the database).
> Alternatively, we may end up adapting our deploy flow to match the current
> flow for uploading DAGs in airflow.
>
> I'll give some context so you have a bit more detail on what we want to
> do. It can definitively be the case that we're overdoing it, or doing
> something wrong.
> We want to match each of our airflow DAGs with an input and output schema
> (a json format that is read like a marshmallow schema) that are stored
> outside airflow, in our own application. The three (DAG + input schema +
> output schema) make a single app. Then, when a user calls the REST API in
> our app, he/she sends the input data (hopefully matching the input schema)
> and the DAG to run. We then store the input data as json in our database
> and start a dagrun at airflow giving it the id of the input data in our
> database so it can read/ write.
>
> All to say, we want to be able to deploy DAG + input schema + output
> schema as simply and integrated as possible, so we avoid discrepancies or
> errors. Since the DAGs are stored in airflow and the other two are stored
> in our application, we would ideally want to do everything from our
> application and be able to upload the DAG to airflow in an automated way as
> part of our own deploy flow.
>
> Thanks again and take care,
>
> Franco Peschiera
>
> On Fri, Oct 16, 2020 at 6:17 AM James Timmins <james@astronomer.io> wrote:
>
>> Hi Franco,
>>
>> I know it may seem strange that the API in 2.0 won't support uploading or
>> substantially modifying DAGs, (as you mentioned, there is an Update
>> endpoint, but it is limited to pausing/unpausing DAGs). This is because the
>> goal for the API, at least for now, is to have feature parity with the
>> Airflow UI and CLI. Since DAG uploading isn't supported by those tools,
>> it's out of scope for the 2.0 API. If you're curious about the goals and
>> decisions behind the API, there's more info in the improvement proposal.
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-32%3A+Airflow+REST+API
>>
>> While I haven't used any of the plugins, you could fork Airflow and add
>> that functionality to the API yourself if you'd like.
>>
>> A reasonable next question is whether or not that functionality is
>> planned for a future release. I haven't heard anything about that on the
>> project roadmap, but others may have more insight there. In general, the
>> focus has been entirely on getting 2.0 stable and shipped. There are some
>> features planned for 2.1, but I don't think they're API related. Beyond
>> that, we'll have to see what features are needed by the community.
>>
>> Hopefully that provides a bit of clarity into the confusing aspects of
>> the API.
>>
>> Kind regards,
>> James
>>
>>
>> On Thu, Oct 15, 2020 at 2:33 PM Franco Peschiera <
>> franco.peschiera@gmail.com> wrote:
>>
>>> wow, that's great! Thanks for the quick and positive response.
>>>
>>> One thing though. I did not find a way to write (POST) a DAG (i.e.,
>>> upload a new DAG). Maybe for security reasons? (Although I see an "Update a
>>> DAG" endpoint).
>>>
>>> Thanks again.
>>>
>>> On Thu, Oct 15, 2020 at 11:19 PM Kaxil Naik <kaxilnaik@gmail.com> wrote:
>>>
>>>> Airflow 2.0 will have a full-featured API:
>>>> https://github.com/apache/airflow/blob/master/UPDATING.md#migration-guide-from-experimental-api-to-stable-api-v1
>>>>
>>>> API Spec & Details:
>>>> https://airflow.readthedocs.io/en/latest/stable-rest-api-ref.html
>>>>
>>>>
>>>> On Thu, Oct 15, 2020 at 10:09 PM Franco Peschiera <
>>>> franco.peschiera@gmail.com> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> We're currently building a web app that makes use of airflow to
>>>>> delegate tasks. First of all, thanks for this excellent tool: it seems
it
>>>>> will save us a lot of time and headaches.
>>>>>
>>>>> I've been checking the REST api since we want to ideally communicate
>>>>> exclusively this way. And yes, I know that functionality appears to be
new
>>>>> / recent (because of the "experimental" tag in the docs and the URL).
>>>>> Having said that, there are some things that the REST API doesn't do
(yet?)
>>>>> that we would love to have: (1) upload a new DAG, (2) check the status
of a
>>>>> dagrun, among others.
>>>>>
>>>>> The rest api docs I'm reading:
>>>>> https://airflow.apache.org/docs/stable/rest-api-ref.html
>>>>>
>>>>> On the other hand, I've found there are side projects / third party
>>>>> plugins that do offer this functionality:
>>>>> https://github.com/teamclairvoyant/airflow-rest-api-plugin
>>>>>
>>>>> So I have the following questions: (1) are there any plans on
>>>>> completing the official REST API to meet the CLI / python ones? (2) is
it a
>>>>> good idea to try third party plugins for this? if so, do you recommend
a
>>>>> specific one?
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> Franco
>>>>>
>>>>>>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
View raw message