airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: Stopping / killing dag runs gracefully
Date Wed, 24 Feb 2021 10:35:07 GMT
Oh yes, that endpoint is the right one to use, and I agree that it is 
slightly out of place -- it probably should have been under DagRun but 
got missed in the review.

(It probably also should have used run_id not execution_date. Oops.)

> From the tests I’ve done, it appears to work as expected: changing 
> the task state to "success" or "failed" terminates the execution. Can 
> someone confirm this endpoint is indeed used to change the status of 
> tasks and thus stop the execution of the dagruns they belong to?
> 
This will stop an individual task, not the dag run as a whole, but yes, 
setting the state and having it stop the in-progress execution is 
expected/supported behaviour


-ash.



On Tue, 23 Feb, 2021 at 23:17, Franco Peschiera 
<franco.peschiera@gmail.com> wrote:
> Hello again,
> 
> Ok, so by looking into the code to actually do the change I 
> discovered there is actually a way to do this from the current REST 
> API, I just had not found it before.
> 
> It appears the updateTaskInstancesState endpoint is what we need. We 
> just need to know the “dag+task+execution_date” for our dagrun 
> and change the status of that task. I had not found this endpoint 
> before because for some reason it is included inside the DAG section 
> of the API (it’s the last one). My reflex would be “everything 
> under DAG goes at a DAG-level of detail” but in fact, 
> updateTaskInstancesState asks for an execution_date (among other 
> things) in the request body of the endpoint so it looks more like a 
> “dag_dun state editor” than a “task state editor” or “dag 
> state editor”. Also, the name of the endpoint does not help much.
> 
> From the tests I’ve done, it appears to work as expected: changing 
> the task state to "success" or "failed" terminates the execution. Can 
> someone confirm this endpoint is indeed used to change the status of 
> tasks and thus stop the execution of the dagruns they belong to?
> 
> thanks again,
> 
> Franco
> 
> 
> On Tue, Feb 23, 2021 at 6:22 PM Franco Peschiera 
> <franco.peschiera@gmail.com <mailto:franco.peschiera@gmail.com>> 
> wrote:
>> Thanks Ash.
>> 
>> Yes, we can attempt some PRs to make it happen. I'll check how the 
>> GUI now allows the update of dagrun's state. If I have any 
>> questions, I'll look for help in slack.
>> 
>> regards,
>> 
>> Franco
>> 
>> On Tue, Feb 23, 2021 at 3:35 PM Ash Berlin-Taylor <ash@apache.org 
>> <mailto:ash@apache.org>> wrote:
>>> Hi Franco,
>>> 
>>> Your use case definitely sounds like a valid one.
>>> 
>>> I personal wouldn't approach this by deleting the DAG run as I 
>>> don't like to delete things where possible(though the fact that 
>>> deleting it doesn't cancel in progress tasks is a bug by itself) -- 
>>> instead I would say there should be an endpoint to set the state of 
>>> the DAG run to success/failed, or perhaps to a new state of 
>>> "cancelled", and that should cascade down to the running Task 
>>> Instances.
>>> 
>>> Would you be up for attempting some PRs to make some/all of this 
>>> happen? We're happy to help guide you through and get this merged 
>>> as it sounds great.
>>> 
>>> Ash
>>> (Airflow PMC member)
>>> 
>>> On Tue, 23 Feb, 2021 at 15:15, Franco Peschiera 
>>> <franco.peschiera@gmail.com <mailto:franco.peschiera@gmail.com>>

>>> wrote:
>>>> Hello again everyone,
>>>> 
>>>> We've continued researching the possibility of interrupting 
>>>> airflow dag runs by command (from outside the Airflow GUI). I put 
>>>> below our notes on this research As I said in my previous email 
>>>> I'm not sure this is something the developers would like to have 
>>>> as a functionality inside airflow. We're of course, willing to 
>>>> discuss this and see if we can help make it happen. If you prefer 
>>>> I can maybe write instead to the developers mailing list or slack 
>>>> channel
>>>> 
>>>> In case you're curious what we're using airflow for, you can check 
>>>> the project here <https://github.com/baobabsoluciones/corn> or 
>>>> here <https://baobabsoluciones.github.io/corn/>
>>>> 
>>>> In our deployments using CeleryExecutor we have the following flow 
>>>> of communication of new tasks to workers.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> According to it, we have three options, we can stop the tasks from 
>>>> the Airflow Scheduler, from the Flower Worker or maybe write a new 
>>>> state into the database. We want to discard this last option as it 
>>>> is too intrusive and not very traceable.
>>>> 
>>>> Terminate tasks from Airflow GUI
>>>> The best option to stop all DAGrun and Flower task execution is to 
>>>> mark the DAGs as "Failed" from the Airflow GUI. When this happens, 
>>>> it automatically revokes all pending Flower tasks on workers. 
>>>> There is no acceptable method like PUT or UPDATE to change the 
>>>> state of DAGRuns. This would be the ideal option for the task of 
>>>> killing all pending processes.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> The operation of deleting the DAGrun does not end the Flower 
>>>> tasks, even so, we can perform the operation with the Airflow REST 
>>>> as follows:
>>>> 
>>>> Delete Airflow DAGrun (with DagRun id -> $dagrun_id)
>>>> curl -X DELETE 
>>>> '<https://devsm.cornflow.baobabsoluciones.app/devsm/airflow/api/v1/dags/timer/dagRuns/>$dagrun_id'

>>>> -H 'Content-Type: application/json' --user "admin:admin"
>>>> 
>>>> Terminate tasks from Flower API or GUI
>>>> Flower allows changing the status of tasks through the REST API. 
>>>> As seen below, we can abort or terminate the tasks as follows:
>>>> 
>>>> Terminate tasks in Flower REST (with task id -> $root_id)
>>>> curl -X POST 
>>>> "https://devsm.cornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/

>>>> <https://devsmcornflow.baobabsoluciones.app/devsm/flower/api/task/revoke/>$root_id?terminate=true"

>>>> --user "admin:admin"
>>>> 
>>>> We can also finish the tasks from Flower GUI satisfactorily.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> The problem when finishing tasks from Flower is that they continue 
>>>> to run in Airflow for an indeterminate time and new tasks are not 
>>>> sent to workers even though they are available.
>>>> 
>>>> thanks,
>>>> 
>>>> Franco
>>>> 
>>>> On Fri, Jan 22, 2021 at 11:23 AM Franco Peschiera 
>>>> <franco.peschiera@gmail.com <mailto:franco.peschiera@gmail.com>>

>>>> wrote:
>>>>> Thanks for the answer Anton.
>>>>> It definitely makes sense. In fact, that's what we already do: we 
>>>>> always pass a time limit to the optimization task and it's 
>>>>> respected. But we still want to reserve the possibility to stop 
>>>>> the run before the time limit has been reached. Tasks can have a 
>>>>> time limit from several seconds to several hours (or days) and 
>>>>> some example reasons why we may want to interrupt an ongoing task 
>>>>> before the time limit are:
>>>>> 
>>>>> * the user realizes the input data is wrong, and so it doesn't 
>>>>> make sense to keep running the task.
>>>>> * the log from the optimization task process shows that the task 
>>>>> is not going according to plan, and so the user wants to 
>>>>> interrupt it to check why and send another one.
>>>>> * some input data or hypothesis has changed and the ongoing 
>>>>> optimization task is obsolete.
>>>>> 
>>>>> thanks again!
>>>>> 
>>>>> Franco
>>>>> 
>>>>> 
>>>>> On Fri, Jan 22, 2021 at 10:29 AM Anton Erholt 
>>>>> <anton.erholt@epidemicsound.com 
>>>>> <mailto:anton.erholt@epidemicsound.com>> wrote:
>>>>>> Apologies for not answering your question about how to stop DAG 
>>>>>> runs. I do not know that. However, I wonder if it would make 
>>>>>> sense to pass along how long the job should as a parameter to 
>>>>>> the optimization task, and when it times out, exit/return 
>>>>>> appopriately so Airflow can read it?
>>>>>> 
>>>>>> Best,
>>>>>> Anton
>>>>>> 
>>>>>> Den fre 22 jan. 2021 08:01Franco Peschiera 
>>>>>> <franco.peschiera@gmail.com <mailto:franco.peschiera@gmail.com>>

>>>>>> skrev:
>>>>>>> Hello again everyone,
>>>>>>> 
>>>>>>> As the title says: I would like to be able to stop / kill a 
>>>>>>> dagrun. I’ve seen this question asked here 
>>>>>>> <https://stackoverflow.com/questions/43631693/how-to-stop-kill-airflow-tasks-from-the-ui>

>>>>>>> and here 
>>>>>>> <https://stackoverflow.com/questions/49039386/how-do-i-stop-an-airflow-dag>.
>>>>>>> Several solutions are proposed but I was wondering if there is

>>>>>>> a “correct” way to stop a dagrun. I’m guessing from the

>>>>>>> airflow 2.0 docs of the REST API that it is probably not 
>>>>>>> possible from there (since I did not see that). And since I do

>>>>>>> not see it anywhere in the docs, I fear there may not be a good

>>>>>>> way to do that properly. Is that so? Is there even an 
>>>>>>> “improper” way?
>>>>>>> 
>>>>>>> As context, the tasks that we want to schedule are optimization

>>>>>>> tasks that do not have a fixed time to run. Users usually put
a 
>>>>>>> time limit, e.g., an hour, and we would anyway put one by 
>>>>>>> default if they don’t. But, in general, users may want to 
>>>>>>> stop an execution if they see it takes too long or if they want

>>>>>>> to change something before running it again. So scheduling and

>>>>>>> stopping dagruns should be a “common” thing to do.
>>>>>>> 
>>>>>>> Thanks as always!
>>>>>>> 
>>>>>>> Franco
>>>>>>> 


Mime
View raw message