airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akeem <Ak...@moneybrag.com>
Subject Re: extension of the REST API
Date Tue, 27 Oct 2020 17:42:13 GMT
please remove me from your subscriber list


Akeem Egbeyemi

Founder at Moneybrag.com

Office-310-734-7668

Mobile-213-200-2643

akeem@moneybrag.com<mailto:akeem@moneybrag.com>

www.moneybrag.com<http://www.moneybrag.com/>

________________________________
From: Jarek Potiuk <Jarek.Potiuk@polidea.com>
Sent: Tuesday, October 27, 2020 10:09 AM
To: users@airflow.apache.org <users@airflow.apache.org>
Subject: Re: extension of the REST API

But the nice thing is that you can run a verification of that at the PR level. You can run
GitHub action or any other CI check that will go through all your DAGs and schemas and will
verify if they are consistent - only allowing to merge it when all the tests pass.

This is the biggest advantage of such setup.

J,

On Tue, Oct 27, 2020 at 5:56 PM Franco Peschiera <franco.peschiera@gmail.com<mailto:franco.peschiera@gmail.com>>
wrote:
Thanks Jarek,

(forgot to answer this last email).

We're going to end up using git-sync and a github repository to deploy our dags.

I do see a couple of disadvantages with respect to our way of using airflow. Namely, as we
change our versions of dags, we will have to do changes in the input and output schemas that
are tied to those dags. The problem will arise when we end up with orphan schemas in the database
that have no dag because we will only keep the last version of the dags while we will keep
all versions of schemas. Also, we will not be able to offer multiple versions of the same
dag.

Still, I think this is a good enough solution.

Thanks again.




On Fri, Oct 16, 2020 at 11:03 AM Jarek Potiuk <Jarek.Potiuk@polidea.com<mailto:Jarek.Potiuk@polidea.com>>
wrote:
Hello Franco,

My 3 cents.

I have another proposal on how to solve your problems. I think API is not the best approach
but there are some best-practices around DAG syncing that I strongly believe will become the
most popular and most "stable" deployments. We've used this when we prepared a deployment
strategy for a few of our customers and we think it's best to build your DAG deployment strategy
around Git-sync.

Simply speaking - rather than distributing the DAGs via API or shared file-system, you commit
your DAGs to a Git repository and let git-sync do the work of syncing it with Airflow workers/schedulers/webserver.
This has numerous advantages and no disadvantages IIMHO:

a) DAGs are python code. It feels natural to keep DAGs in a Git repo
b) You can track history, origin, authoring and easily see what's changed in those DAGs.
c) You can include DAG verification process - both manual review and automated CI checks (for
example Github Actions) - Git/GitHub provides out-of-the-box all the tools and automation
you need for that. We've successfully implemented both manual review process as well as sophisticated
automated checks (for example how long does it take to parse the DAG) using GitHub Actions
d) In the "chart" of Airflow (the one from "master" sources - not yet officially released),
you have full support for the git-sync sidecar to run alongside your workers/scheduler/webserver.
Git-sync is "first-class-citizen" in the Airflow "ecosystem"

J.

On Fri, Oct 16, 2020 at 10:20 AM Franco Peschiera <franco.peschiera@gmail.com<mailto:franco.peschiera@gmail.com>>
wrote:
Hello James,

Thanks for the detailed response. I completely understand.
We will check how the plugins work and probably do the modifications ourselves. If we do end
up automating the uploading in an airflow fork, we will create a PR in case it's interesting
(I guess we just need to see how / where the DAGs are serialized and stored in the database).
Alternatively, we may end up adapting our deploy flow to match the current flow for uploading
DAGs in airflow.

I'll give some context so you have a bit more detail on what we want to do. It can definitively
be the case that we're overdoing it, or doing something wrong.
We want to match each of our airflow DAGs with an input and output schema (a json format that
is read like a marshmallow schema) that are stored outside airflow, in our own application.
The three (DAG + input schema + output schema) make a single app. Then, when a user calls
the REST API in our app, he/she sends the input data (hopefully matching the input schema)
and the DAG to run. We then store the input data as json in our database and start a dagrun
at airflow giving it the id of the input data in our database so it can read/ write.

All to say, we want to be able to deploy DAG + input schema + output schema as simply and
integrated as possible, so we avoid discrepancies or errors. Since the DAGs are stored in
airflow and the other two are stored in our application, we would ideally want to do everything
from our application and be able to upload the DAG to airflow in an automated way as part
of our own deploy flow.

Thanks again and take care,

Franco Peschiera

On Fri, Oct 16, 2020 at 6:17 AM James Timmins <james@astronomer.io<mailto:james@astronomer.io>>
wrote:
Hi Franco,

I know it may seem strange that the API in 2.0 won't support uploading or substantially modifying
DAGs, (as you mentioned, there is an Update endpoint, but it is limited to pausing/unpausing
DAGs). This is because the goal for the API, at least for now, is to have feature parity with
the Airflow UI and CLI. Since DAG uploading isn't supported by those tools, it's out of scope
for the 2.0 API. If you're curious about the goals and decisions behind the API, there's more
info in the improvement proposal. https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-32%3A+Airflow+REST+API

While I haven't used any of the plugins, you could fork Airflow and add that functionality
to the API yourself if you'd like.

A reasonable next question is whether or not that functionality is planned for a future release.
I haven't heard anything about that on the project roadmap, but others may have more insight
there. In general, the focus has been entirely on getting 2.0 stable and shipped. There are
some features planned for 2.1, but I don't think they're API related. Beyond that, we'll have
to see what features are needed by the community.

Hopefully that provides a bit of clarity into the confusing aspects of the API.

Kind regards,
James


On Thu, Oct 15, 2020 at 2:33 PM Franco Peschiera <franco.peschiera@gmail.com<mailto:franco.peschiera@gmail.com>>
wrote:
wow, that's great! Thanks for the quick and positive response.

One thing though. I did not find a way to write (POST) a DAG (i.e., upload a new DAG). Maybe
for security reasons? (Although I see an "Update a DAG" endpoint).

Thanks again.

On Thu, Oct 15, 2020 at 11:19 PM Kaxil Naik <kaxilnaik@gmail.com<mailto:kaxilnaik@gmail.com>>
wrote:
Airflow 2.0 will have a full-featured API: https://github.com/apache/airflow/blob/master/UPDATING.md#migration-guide-from-experimental-api-to-stable-api-v1

API Spec & Details: https://airflow.readthedocs.io/en/latest/stable-rest-api-ref.html


On Thu, Oct 15, 2020 at 10:09 PM Franco Peschiera <franco.peschiera@gmail.com<mailto:franco.peschiera@gmail.com>>
wrote:
Hello everyone,

We're currently building a web app that makes use of airflow to delegate tasks. First of all,
thanks for this excellent tool: it seems it will save us a lot of time and headaches.

I've been checking the REST api since we want to ideally communicate exclusively this way.
And yes, I know that functionality appears to be new / recent (because of the "experimental"
tag in the docs and the URL). Having said that, there are some things that the REST API doesn't
do (yet?) that we would love to have: (1) upload a new DAG, (2) check the status of a dagrun,
among others.

The rest api docs I'm reading: https://airflow.apache.org/docs/stable/rest-api-ref.html

On the other hand, I've found there are side projects / third party plugins that do offer
this functionality: https://github.com/teamclairvoyant/airflow-rest-api-plugin

So I have the following questions: (1) are there any plans on completing the official REST
API to meet the CLI / python ones? (2) is it a good idea to try third party plugins for this?
if so, do you recommend a specific one?

Thanks again!

Franco


--
[https://s3.eu-central-1.amazonaws.com/corgi-mail/23-05-2019/jarek.potiuk/jarpot.jpg]

Jarek Potiuk
Polidea<https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129<tel:+48660796129>
[Polidea]<https://www.polidea.com/>



--
[https://s3.eu-central-1.amazonaws.com/corgi-mail/23-05-2019/jarek.potiuk/jarpot.jpg]

Jarek Potiuk
Polidea<https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129<tel:+48660796129>
[Polidea]<https://www.polidea.com/>


Mime
View raw message