airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: extension of the REST API
Date Tue, 27 Oct 2020 20:01:12 GMT
No such powers: You can only do it yourself following those instructions
https://apache.org/foundation/mailinglists.html#subscribing

On Tue, Oct 27, 2020 at 6:42 PM Akeem <Akeem@moneybrag.com> wrote:

> please remove me from your subscriber list
>
> *Akeem Egbeyemi*
>
> *Founder at Moneybrag.com*
>
> Office-310-734-7668
>
> Mobile-213-200-2643
>
> akeem@moneybrag.com
> www.moneybrag.com
>
> ------------------------------
> *From:* Jarek Potiuk <Jarek.Potiuk@polidea.com>
> *Sent:* Tuesday, October 27, 2020 10:09 AM
> *To:* users@airflow.apache.org <users@airflow.apache.org>
> *Subject:* Re: extension of the REST API
>
> But the nice thing is that you can run a verification of that at the PR
> level. You can run GitHub action or any other CI check that will go through
> all your DAGs and schemas and will verify if they are consistent -
> only allowing to merge it when all the tests pass.
>
> This is the biggest advantage of such setup.
>
> J,
>
> On Tue, Oct 27, 2020 at 5:56 PM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
> Thanks Jarek,
>
> (forgot to answer this last email).
>
> We're going to end up using git-sync and a github repository to deploy our
> dags.
>
> I do see a couple of disadvantages with respect to our way of using
> airflow. Namely, as we change our versions of dags, we will have to do
> changes in the input and output schemas that are tied to those dags. The
> problem will arise when we end up with orphan schemas in the database that
> have no dag because we will only keep the last version of the dags while we
> will keep all versions of schemas. Also, we will not be able to offer
> multiple versions of the same dag.
>
> Still, I think this is a good enough solution.
>
> Thanks again.
>
>
>
>
> On Fri, Oct 16, 2020 at 11:03 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
> Hello Franco,
>
> My 3 cents.
>
> I have another proposal on how to solve your problems. I think API is not
> the best approach but there are some best-practices around DAG syncing that
> I strongly believe will become the most popular and most "stable"
> deployments. We've used this when we prepared a deployment strategy for a
> few of our customers and we think it's best to build your DAG deployment
> strategy around Git-sync.
>
> Simply speaking - rather than distributing the DAGs via API or shared
> file-system, you commit your DAGs to a Git repository and let git-sync do
> the work of syncing it with Airflow workers/schedulers/webserver. This has
> numerous advantages and no disadvantages IIMHO:
>
> a) DAGs are python code. It feels natural to keep DAGs in a Git repo
> b) You can track history, origin, authoring and easily see what's changed
> in those DAGs.
> c) You can include DAG verification process - both manual review and
> automated CI checks (for example Github Actions) - Git/GitHub provides
> out-of-the-box all the tools and automation you need for that. We've
> successfully implemented both manual review process as well as
> sophisticated automated checks (for example how long does it take to parse
> the DAG) using GitHub Actions
> d) In the "chart" of Airflow (the one from "master" sources - not yet
> officially released), you have full support for the git-sync sidecar to run
> alongside your workers/scheduler/webserver. Git-sync is
> "first-class-citizen" in the Airflow "ecosystem"
>
> J.
>
> On Fri, Oct 16, 2020 at 10:20 AM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
> Hello James,
>
> Thanks for the detailed response. I completely understand.
> We will check how the plugins work and probably do the modifications
> ourselves. If we do end up automating the uploading in an airflow fork, we
> will create a PR in case it's interesting (I guess we just need to see how
> / where the DAGs are serialized and stored in the database).
> Alternatively, we may end up adapting our deploy flow to match the current
> flow for uploading DAGs in airflow.
>
> I'll give some context so you have a bit more detail on what we want to
> do. It can definitively be the case that we're overdoing it, or doing
> something wrong.
> We want to match each of our airflow DAGs with an input and output schema
> (a json format that is read like a marshmallow schema) that are stored
> outside airflow, in our own application. The three (DAG + input schema +
> output schema) make a single app. Then, when a user calls the REST API in
> our app, he/she sends the input data (hopefully matching the input schema)
> and the DAG to run. We then store the input data as json in our database
> and start a dagrun at airflow giving it the id of the input data in our
> database so it can read/ write.
>
> All to say, we want to be able to deploy DAG + input schema + output
> schema as simply and integrated as possible, so we avoid discrepancies or
> errors. Since the DAGs are stored in airflow and the other two are stored
> in our application, we would ideally want to do everything from our
> application and be able to upload the DAG to airflow in an automated way as
> part of our own deploy flow.
>
> Thanks again and take care,
>
> Franco Peschiera
>
> On Fri, Oct 16, 2020 at 6:17 AM James Timmins <james@astronomer.io> wrote:
>
> Hi Franco,
>
> I know it may seem strange that the API in 2.0 won't support uploading or
> substantially modifying DAGs, (as you mentioned, there is an Update
> endpoint, but it is limited to pausing/unpausing DAGs). This is because the
> goal for the API, at least for now, is to have feature parity with the
> Airflow UI and CLI. Since DAG uploading isn't supported by those tools,
> it's out of scope for the 2.0 API. If you're curious about the goals and
> decisions behind the API, there's more info in the improvement proposal.
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-32%3A+Airflow+REST+API
>
> While I haven't used any of the plugins, you could fork Airflow and add
> that functionality to the API yourself if you'd like.
>
> A reasonable next question is whether or not that functionality is planned
> for a future release. I haven't heard anything about that on the project
> roadmap, but others may have more insight there. In general, the focus has
> been entirely on getting 2.0 stable and shipped. There are some features
> planned for 2.1, but I don't think they're API related. Beyond that, we'll
> have to see what features are needed by the community.
>
> Hopefully that provides a bit of clarity into the confusing aspects of the
> API.
>
> Kind regards,
> James
>
>
> On Thu, Oct 15, 2020 at 2:33 PM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
> wow, that's great! Thanks for the quick and positive response.
>
> One thing though. I did not find a way to write (POST) a DAG (i.e., upload
> a new DAG). Maybe for security reasons? (Although I see an "Update a DAG"
> endpoint).
>
> Thanks again.
>
> On Thu, Oct 15, 2020 at 11:19 PM Kaxil Naik <kaxilnaik@gmail.com> wrote:
>
> Airflow 2.0 will have a full-featured API:
> https://github.com/apache/airflow/blob/master/UPDATING.md#migration-guide-from-experimental-api-to-stable-api-v1
>
> API Spec & Details:
> https://airflow.readthedocs.io/en/latest/stable-rest-api-ref.html
>
>
> On Thu, Oct 15, 2020 at 10:09 PM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
> Hello everyone,
>
> We're currently building a web app that makes use of airflow to delegate
> tasks. First of all, thanks for this excellent tool: it seems it will save
> us a lot of time and headaches.
>
> I've been checking the REST api since we want to ideally communicate
> exclusively this way. And yes, I know that functionality appears to be new
> / recent (because of the "experimental" tag in the docs and the URL).
> Having said that, there are some things that the REST API doesn't do (yet?)
> that we would love to have: (1) upload a new DAG, (2) check the status of a
> dagrun, among others.
>
> The rest api docs I'm reading:
> https://airflow.apache.org/docs/stable/rest-api-ref.html
>
> On the other hand, I've found there are side projects / third party
> plugins that do offer this functionality:
> https://github.com/teamclairvoyant/airflow-rest-api-plugin
>
> So I have the following questions: (1) are there any plans on completing
> the official REST API to meet the CLI / python ones? (2) is it a good idea
> to try third party plugins for this? if so, do you recommend a specific one?
>
> Thanks again!
>
> Franco
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
View raw message