airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Germain TANGUY <germain.tan...@dailymotion.com>
Subject Re: Deploy procedure for new/modify dags
Date Thu, 20 Jul 2017 13:10:27 GMT
Hello Arthur,

Thanks for your help,

In your case I will have to update worker code, not necessarily webserver/scheduler and I
will set the option --ship_dag to False.

This deployment method imply that I have to pause all my dags, wait my queue is empty and
restart my worker to pull and install the new code and dependencies. I have some external
dependencies which take time to pip install so my service won’t be available during this
time. Am I correct in assuming this? 

I discovered that we can specify the queue where the scheduler push the tasks and the worker
listen to. Can it be a viable solution to create a queue for each commit, to deploy a new
set of workers for each commit and to kill the old one when they don’t have anything anymore
in their old queue?

Germain T.




On 19/07/17 07:53, "Arthur Wiedmer" <arthur.wiedmer@gmail.com> wrote:

    Hi Germain,
    
    As long as the structure of the DAG is not changed (tasks are the same and
    the dependency graph does not change), there should be no need to restart
    anything.
    
    The scheduler only needs the structure of the DAG to send the right message
    to celery. Essentially the message tells the worker to run an airflow run
    command for this dag_id, this task_id and the execution_date.
    While the webserver for instance might show you an older version of the
    bash script, the code executed will be the latest available on the worker.
    You should be able to check this by checking the logs for the task, since
    the script is usually logged there.
    
    I hope this helps,
    
    Sincerely,
    Arthur
    
    
    On Mon, Jul 17, 2017 at 11:56 PM, Germain TANGUY <
    germain.tanguy@dailymotion.com> wrote:
    
    > Hello everybody,
    >
    > I would like to know what are your procedure to deploy new versions of
    > your DAGs, especially for dags that have external dependencies (bash
    > script..etc)
    > I use CeleryExecutor with multiples workers and so there is an issue of
    > consistency between workers, schedulers and webserver.
    >
    > Today I pause the dags, I wait until all running tasks complete, I restart
    > all airflow services and unpause the dags. Is there a better way?
    >
    > Best regards,
    >
    > Germain T.
    >
    >
    

Mime
View raw message