airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco Peschiera <franco.peschi...@gmail.com>
Subject Re: Stopping / killing dag runs gracefully
Date Tue, 23 Feb 2021 17:22:20 GMT
Thanks Ash.

Yes, we can attempt some PRs to make it happen. I'll check how the GUI now
allows the update of dagrun's state. If I have any questions, I'll look for
help in slack.

regards,

Franco

On Tue, Feb 23, 2021 at 3:35 PM Ash Berlin-Taylor <ash@apache.org> wrote:

> Hi Franco,
>
> Your use case definitely sounds like a valid one.
>
> I personal wouldn't approach this by deleting the DAG run as I don't like
> to delete things where possible(though the fact that deleting it doesn't
> cancel in progress tasks is a bug by itself) -- instead I would say there
> should be an endpoint to set the state of the DAG run to success/failed, or
> perhaps to a new state of "cancelled", and that should cascade down to the
> running Task Instances.
>
> Would you be up for attempting some PRs to make some/all of this happen?
> We're happy to help guide you through and get this merged as it sounds
> great.
>
> Ash
> (Airflow PMC member)
>
> On Tue, 23 Feb, 2021 at 15:15, Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
> Hello again everyone,
>
> We've continued researching the possibility of interrupting airflow dag
> runs by command (from outside the Airflow GUI). I put below our notes on
> this research As I said in my previous email I'm not sure this is something
> the developers would like to have as a functionality inside airflow. We're
> of course, willing to discuss this and see if we can help make it happen.
> If you prefer I can maybe write instead to the developers mailing list or
> slack channel
>
> In case you're curious what we're using airflow for, you can check the
> project here https://github.com/baobabsoluciones/corn or here
> https://baobabsoluciones.github.io/corn/
>
> In our deployments using CeleryExecutor we have the following flow of
> communication of new tasks to workers.
>
> [image: image.png]
>
>
> According to it, we have three options, we can stop the tasks from the
> Airflow Scheduler, from the Flower Worker or maybe write a new state into
> the database. We want to discard this last option as it is too intrusive
> and not very traceable.
>
> Terminate tasks from Airflow GUI
> The best option to stop all DAGrun and Flower task execution is to mark
> the DAGs as "Failed" from the Airflow GUI. When this happens, it
> automatically revokes all pending Flower tasks on workers. There is no
> acceptable method like PUT or UPDATE to change the state of DAGRuns. This
> would be the ideal option for the task of killing all pending processes.
>
> [image: image.png]
>
>
> The operation of deleting the DAGrun does not end the Flower tasks, even
> so, we can perform the operation with the Airflow REST as follows:
>
> Delete Airflow DAGrun (with DagRun id -> $dagrun_id)
> curl -X DELETE '
> https://devsm.cornflow.baobabsoluciones.app/devsm/airflow/api/v1/dags/timer/dagRuns/
> $dagrun_id' -H 'Content-Type: application/json' --user "admin:admin"
>
> Terminate tasks from Flower API or GUI
> Flower allows changing the status of tasks through the REST API. As seen
> below, we can abort or terminate the tasks as follows:
>
> Terminate tasks in Flower REST (with task id -> $root_id)
> curl -X POST "
> https://devsm.cornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/
> <https://devsmcornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/>
> $root_id?terminate=true" --user "admin:admin"
>
> We can also finish the tasks from Flower GUI satisfactorily.
>
> [image: image.png]
>
>
> The problem when finishing tasks from Flower is that they continue to run
> in Airflow for an indeterminate time and new tasks are not sent to workers
> even though they are available.
>
> thanks,
>
> Franco
>
> On Fri, Jan 22, 2021 at 11:23 AM Franco Peschiera <
> franco.peschiera@gmail.com> wrote:
>
>> Thanks for the answer Anton.
>> It definitely makes sense. In fact, that's what we already do: we always
>> pass a time limit to the optimization task and it's respected. But we still
>> want to reserve the possibility to stop the run before the time limit has
>> been reached. Tasks can have a time limit from several seconds to several
>> hours (or days) and some example reasons why we may want to interrupt an
>> ongoing task before the time limit are:
>>
>> * the user realizes the input data is wrong, and so it doesn't make sense
>> to keep running the task.
>> * the log from the optimization task process shows that the task is not
>> going according to plan, and so the user wants to interrupt it to check why
>> and send another one.
>> * some input data or hypothesis has changed and the ongoing optimization
>> task is obsolete.
>>
>> thanks again!
>>
>> Franco
>>
>>
>> On Fri, Jan 22, 2021 at 10:29 AM Anton Erholt <
>> anton.erholt@epidemicsound.com> wrote:
>>
>>> Apologies for not answering your question about how to stop DAG runs. I
>>> do not know that. However, I wonder if it would make sense to pass along
>>> how long the job should as a parameter to the optimization task, and when
>>> it times out, exit/return appopriately so Airflow can read it?
>>>
>>> Best,
>>> Anton
>>>
>>> Den fre 22 jan. 2021 08:01Franco Peschiera <franco.peschiera@gmail.com>
>>> skrev:
>>>
>>>> Hello again everyone,
>>>>
>>>> As the title says: I would like to be able to stop / kill a dagrun.
>>>> I’ve seen this question asked here
>>>> <https://stackoverflow.com/questions/43631693/how-to-stop-kill-airflow-tasks-from-the-ui>
>>>> and here
>>>> <https://stackoverflow.com/questions/49039386/how-do-i-stop-an-airflow-dag>
>>>> .
>>>> Several solutions are proposed but I was wondering if there is a
>>>> “correct” way to stop a dagrun. I’m guessing from the airflow 2.0 docs
of
>>>> the REST API that it is probably not possible from there (since I did not
>>>> see that). And since I do not see it anywhere in the docs, I fear there may
>>>> not be a good way to do that properly. Is that so? Is there even an
>>>> “improper” way?
>>>>
>>>> As context, the tasks that we want to schedule are optimization tasks
>>>> that do not have a fixed time to run. Users usually put a time limit, e.g.,
>>>> an hour, and we would anyway put one by default if they don’t. But, in
>>>> general, users may want to stop an execution if they see it takes too long
>>>> or if they want to change something before running it again. So scheduling
>>>> and stopping dagruns should be a “common” thing to do.
>>>>
>>>> Thanks as always!
>>>>
>>>> Franco
>>>>
>>>

Mime
View raw message