airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco Peschiera <franco.peschi...@gmail.com>
Subject Re: Stopping / killing dag runs gracefully
Date Tue, 23 Feb 2021 22:17:04 GMT
Hello again,

Ok, so by looking into the code to actually do the change I discovered
there is actually a way to do this from the current REST API, I just had
not found it before.

It appears the updateTaskInstancesState endpoint is what we need. We just
need to know the “dag+task+execution_date” for our dagrun and change the
status of that task. I had not found this endpoint before because for some
reason it is included inside the DAG section of the API (it’s the last
one). My reflex would be “everything under DAG goes at a DAG-level of
detail” but in fact, updateTaskInstancesState asks for an execution_date
(among other things) in the request body of the endpoint so it looks more
like a “dag_dun state editor” than a “task state editor” or “dag state
editor”. Also, the name of the endpoint does not help much.

>From the tests I’ve done, it appears to work as expected: changing the task
state to "success" or "failed" terminates the execution. Can someone
confirm this endpoint is indeed used to change the status of tasks and thus
stop the execution of the dagruns they belong to?

thanks again,

Franco

On Tue, Feb 23, 2021 at 6:22 PM Franco Peschiera <franco.peschiera@gmail.com>
wrote:

> Thanks Ash.
>
> Yes, we can attempt some PRs to make it happen. I'll check how the GUI now
> allows the update of dagrun's state. If I have any questions, I'll look for
> help in slack.
>
> regards,
>
> Franco
>
> On Tue, Feb 23, 2021 at 3:35 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>
>> Hi Franco,
>>
>> Your use case definitely sounds like a valid one.
>>
>> I personal wouldn't approach this by deleting the DAG run as I don't like
>> to delete things where possible(though the fact that deleting it doesn't
>> cancel in progress tasks is a bug by itself) -- instead I would say there
>> should be an endpoint to set the state of the DAG run to success/failed, or
>> perhaps to a new state of "cancelled", and that should cascade down to the
>> running Task Instances.
>>
>> Would you be up for attempting some PRs to make some/all of this happen?
>> We're happy to help guide you through and get this merged as it sounds
>> great.
>>
>> Ash
>> (Airflow PMC member)
>>
>> On Tue, 23 Feb, 2021 at 15:15, Franco Peschiera <
>> franco.peschiera@gmail.com> wrote:
>>
>> Hello again everyone,
>>
>> We've continued researching the possibility of interrupting airflow dag
>> runs by command (from outside the Airflow GUI). I put below our notes on
>> this research As I said in my previous email I'm not sure this is something
>> the developers would like to have as a functionality inside airflow. We're
>> of course, willing to discuss this and see if we can help make it happen.
>> If you prefer I can maybe write instead to the developers mailing list or
>> slack channel
>>
>> In case you're curious what we're using airflow for, you can check the
>> project here https://github.com/baobabsoluciones/corn or here
>> https://baobabsoluciones.github.io/corn/
>>
>> In our deployments using CeleryExecutor we have the following flow of
>> communication of new tasks to workers.
>>
>> [image: image.png]
>>
>>
>> According to it, we have three options, we can stop the tasks from the
>> Airflow Scheduler, from the Flower Worker or maybe write a new state into
>> the database. We want to discard this last option as it is too intrusive
>> and not very traceable.
>>
>> Terminate tasks from Airflow GUI
>> The best option to stop all DAGrun and Flower task execution is to mark
>> the DAGs as "Failed" from the Airflow GUI. When this happens, it
>> automatically revokes all pending Flower tasks on workers. There is no
>> acceptable method like PUT or UPDATE to change the state of DAGRuns. This
>> would be the ideal option for the task of killing all pending processes.
>>
>> [image: image.png]
>>
>>
>> The operation of deleting the DAGrun does not end the Flower tasks, even
>> so, we can perform the operation with the Airflow REST as follows:
>>
>> Delete Airflow DAGrun (with DagRun id -> $dagrun_id)
>> curl -X DELETE '
>> https://devsm.cornflow.baobabsoluciones.app/devsm/airflow/api/v1/dags/timer/dagRuns/
>> $dagrun_id' -H 'Content-Type: application/json' --user "admin:admin"
>>
>> Terminate tasks from Flower API or GUI
>> Flower allows changing the status of tasks through the REST API. As seen
>> below, we can abort or terminate the tasks as follows:
>>
>> Terminate tasks in Flower REST (with task id -> $root_id)
>> curl -X POST "
>> https://devsm.cornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/
>> <https://devsmcornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/>
>> $root_id?terminate=true" --user "admin:admin"
>>
>> We can also finish the tasks from Flower GUI satisfactorily.
>>
>> [image: image.png]
>>
>>
>> The problem when finishing tasks from Flower is that they continue to run
>> in Airflow for an indeterminate time and new tasks are not sent to workers
>> even though they are available.
>>
>> thanks,
>>
>> Franco
>>
>> On Fri, Jan 22, 2021 at 11:23 AM Franco Peschiera <
>> franco.peschiera@gmail.com> wrote:
>>
>>> Thanks for the answer Anton.
>>> It definitely makes sense. In fact, that's what we already do: we always
>>> pass a time limit to the optimization task and it's respected. But we still
>>> want to reserve the possibility to stop the run before the time limit has
>>> been reached. Tasks can have a time limit from several seconds to several
>>> hours (or days) and some example reasons why we may want to interrupt an
>>> ongoing task before the time limit are:
>>>
>>> * the user realizes the input data is wrong, and so it doesn't make
>>> sense to keep running the task.
>>> * the log from the optimization task process shows that the task is not
>>> going according to plan, and so the user wants to interrupt it to check why
>>> and send another one.
>>> * some input data or hypothesis has changed and the ongoing optimization
>>> task is obsolete.
>>>
>>> thanks again!
>>>
>>> Franco
>>>
>>>
>>> On Fri, Jan 22, 2021 at 10:29 AM Anton Erholt <
>>> anton.erholt@epidemicsound.com> wrote:
>>>
>>>> Apologies for not answering your question about how to stop DAG runs. I
>>>> do not know that. However, I wonder if it would make sense to pass along
>>>> how long the job should as a parameter to the optimization task, and when
>>>> it times out, exit/return appopriately so Airflow can read it?
>>>>
>>>> Best,
>>>> Anton
>>>>
>>>> Den fre 22 jan. 2021 08:01Franco Peschiera <franco.peschiera@gmail.com>
>>>> skrev:
>>>>
>>>>> Hello again everyone,
>>>>>
>>>>> As the title says: I would like to be able to stop / kill a dagrun.
>>>>> I’ve seen this question asked here
>>>>> <https://stackoverflow.com/questions/43631693/how-to-stop-kill-airflow-tasks-from-the-ui>
>>>>> and here
>>>>> <https://stackoverflow.com/questions/49039386/how-do-i-stop-an-airflow-dag>
>>>>> .
>>>>> Several solutions are proposed but I was wondering if there is a
>>>>> “correct” way to stop a dagrun. I’m guessing from the airflow 2.0
docs of
>>>>> the REST API that it is probably not possible from there (since I did
not
>>>>> see that). And since I do not see it anywhere in the docs, I fear there
may
>>>>> not be a good way to do that properly. Is that so? Is there even an
>>>>> “improper” way?
>>>>>
>>>>> As context, the tasks that we want to schedule are optimization tasks
>>>>> that do not have a fixed time to run. Users usually put a time limit,
e.g.,
>>>>> an hour, and we would anyway put one by default if they don’t. But,
in
>>>>> general, users may want to stop an execution if they see it takes too
long
>>>>> or if they want to change something before running it again. So scheduling
>>>>> and stopping dagruns should be a “common” thing to do.
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> Franco
>>>>>
>>>>

Mime
View raw message