airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <ja...@potiuk.com>
Subject Re: Accessing external DAGs from Docker Container
Date Mon, 09 Aug 2021 14:40:03 GMT
You do not need to set the AIRFLOW_HOME if you use the official image. See
https://airflow.apache.org/docs/docker-stack/index.html for details.

I am guessing, but do you happen to use remote docker engine rather than
locally running one ? If so - mounting local volumes (bind mount) only
works if you have local engine on the same host as the volumes or when you
take care manually about mapping the same folders to all the host machines
that your container engine can run containers on
https://docs.docker.com/storage/bind-mounts/

The local mount we have in our quick-start compose is only really useful
for that - quick-start - and maybe for some very, very simple use cases
when you are ok with running whole airflow on single machine (but that kind
of defeats the purpose of scalability part.of airflow)

If you want to continue using docker-compose you should use much more
sophisticated mechanisms - for example other way of using volumes
https://docs.docker.com/storage/volumes/ but this is usually not as easy as
mounting local folders - you need to create and manage the volumes and add
some backing store for them if you want to share data between multiple
running containers. Or you can prepare custom image where you bake the dags
in (but it limits the 'flexibility' as you have to rebuild the images every
time you change the dags.

Due to complexity and variability of potential use cases we do not yet have
support in the community for (many variants of) docker-compose suitable for
production use - do if you want to go that route, there will be limited
support from the community, there are multiple ways you can share volumes
and configure them so it's hard to support or even suggest any single way.

However  If you want to make more 'serious' deployment of Airflow, I'd
heartily recommend going Kubernetes and Airflow-community supported
Official Helm Chart instead. It has all the different options of Dag
sharing (including the best IMHO - Git-sync). It has all community support
and is currently the state-of-the-art of deploying Airflow. And it has a
lot of other useful features which are easy to use and configure.

J.

pon., 9 sie 2021, 15:47 użytkownik Anthony Joyce <
anthony.joyce@omicronmedia.com> napisał:

> Hi Franco,
>
> Adding the Airflow home environmental variable didn’t seem to help,
> although your suggestion makes sense. I am not completely sure why it is
> not being picked up. See below. (I put in the absolute path with and
> without single quotes to test it out.)
>
> *version*: '3'
> *x-airflow-common*:
>   &airflow-common
>   *image*: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
>   *environment*:
>     &airflow-common-env
>     *AIRFLOW__CORE__EXECUTOR*: CeleryExecutor
>     *AIRFLOW__CORE__SQL_ALCHEMY_CONN*: postgresql+psycopg2://airflow@
> {host}/airflow
>     *AIRFLOW__CELERY__RESULT_BACKEND*: db+postgresql://airflow@
> {host}/airflow
>     *AIRFLOW__CELERY__BROKER_URL*: redis://:@redis:6379/0
>     *AIRFLOW__CORE__FERNET_KEY*: {key}
>     *AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION*: 'true'
>     *AIRFLOW__CORE__LOAD_EXAMPLES*: 'false'
>     *AIRFLOW__CORE__DAGS_FOLDER*: ‘{HOME}/airflow/dags'
>     *AIRFLOW__API__AUTH_BACKEND*: 'airflow.api.auth.backend.basic_auth'
>     *_PIP_ADDITIONAL_REQUIREMENTS*: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
>   *volumes*:
>     - ./dags:/opt/airflow/dags
>     - ./logs:/opt/airflow/logs
>
> Perhaps it has something to do with these WARNINGs I have been receiving
> from flower?
>
> flower_1             | [2021-08-09 13:28:37,441] {mixins.py:229} INFO -
> Connected to redis://redis:6379/0
> flower_1             | [2021-08-09 13:28:38,654] {inspector.py:42}
> WARNING - Inspect method stats failed
> flower_1             | [2021-08-09 13:28:38,655] {inspector.py:42}
> WARNING - Inspect method conf failed
> flower_1             | [2021-08-09 13:28:38,663] {inspector.py:42}
> WARNING - Inspect method active_queues failed
> flower_1             | [2021-08-09 13:28:38,668] {inspector.py:42}
> WARNING - Inspect method scheduled failed
> flower_1             | [2021-08-09 13:28:38,669] {inspector.py:42}
> WARNING - Inspect method revoked failed
> flower_1             | [2021-08-09 13:28:38,670] {inspector.py:42}
> WARNING - Inspect method active failed
> flower_1             | [2021-08-09 13:28:38,671] {inspector.py:42}
> WARNING - Inspect method reserved failed
> flower_1             | [2021-08-09 13:28:38,672] {inspector.py:42}
> WARNING - Inspect method registered failed
>
>
> Any other suggestions welcome.
>
> Thanks,
>
> Anthony
>
>
> On Aug 9, 2021, at 8:47 AM, Franco Peschiera <franco.peschiera@gmail.com>
> wrote:
>
> You don't often get email from franco.peschiera@gmail.com. Learn why this
> is important <http://aka.ms/LearnAboutSenderIdentification>
> I don't see you have configured the AIRFLOW_HOME environment variable.
> Try setting it to the absolute path so that it can find your dags. Also,
> there is an environment variable where you can manually pass the path to
> the DAGs folder. AIRFLOW__CORE__DAGS_FOLDER
>
> On Mon, Aug 9, 2021, 14:39 Anthony Joyce <anthony.joyce@omicronmedia.com>
> wrote:
>
>> Hi all,
>>
>> This is my first time the email list so I thank you in advance for the
>> help.
>>
>> Here is my situation:
>>
>> We are running Airflow 1.10.10..a pretty old version at this point
>> without a container and built from pip on CentOS. Instead of updating
>> anaconda and dealing with dependency hell, I decided to download the
>> official apache/airflow docker container and try to configure it to my
>> already existing meta database and DAGs. It seems the container initialized
>> successfully picking up our Variables and Connections via our existing
>> Postgres-13 meta database with all containers healthy at this point.
>> However, I am having a problem connecting our external Airflow DAGs folder
>> (~/airflow/dags). I have copied the information in our ~/airflow/dags
>> folder in the ./dags folder but that doesn’t seem to help.
>>
>> Do you all have any advice/suggestions regarding this issue?
>>
>> Here is the redacted docker-compose.yaml file below.
>>
>> Thank you for the help!
>>
>> Best,
>>
>> Anthony
>>
>> *version*: '3'
>> *x-airflow-common*:
>>   &airflow-common
>>   *image*: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
>>   *environment*:
>>     &airflow-common-env
>>     *AIRFLOW__CORE__EXECUTOR*: CeleryExecutor
>>     *AIRFLOW__CORE__SQL_ALCHEMY_CONN*: postgresql+psycopg2://airflow@
>> {host}/airflow
>>     *AIRFLOW__CELERY__RESULT_BACKEND*: db+postgresql://airflow@
>> {host}/airflow
>>     *AIRFLOW__CELERY__BROKER_URL*: redis://:@redis:6379/0
>>     *AIRFLOW__CORE__FERNET_KEY*: {key}
>>     *AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION*: 'true'
>>     *AIRFLOW__CORE__LOAD_EXAMPLES*: 'false'
>>     *AIRFLOW__API__AUTH_BACKEND*: 'airflow.api.auth.backend.basic_auth'
>>     *_PIP_ADDITIONAL_REQUIREMENTS*: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
>>   *volumes*:
>>     - ./dags:/opt/airflow/dags
>>     - ./logs:/opt/airflow/logs
>>     #- /home/etl/airflow/plugins/:/opt/airflow/plugins
>>   *user*: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
>>   *depends_on*:
>>     *redis*:
>>       *condition*: service_healthy
>>     *postgres*:
>>       *condition*: service_healthy
>>
>> *services*:
>>   *postgres*:
>>     *image*: postgres:13
>>     *environment*:
>>       *POSTGRES_USER*: airflow
>>       *POSTGRES_PASSWORD*: airflow
>>       *POSTGRES_DB*: airflow
>>     *volumes*:
>>       - postgres-db-volume:/localdata/pgdata
>>     *healthcheck*:
>>       *test*: ["CMD", "pg_isready", "-U", "airflow"]
>>       *interval*: 5s
>>       *retries*: 5
>>     *restart*: always
>>
>>   *redis*:
>>     *image*: redis:latest
>>     *ports*:
>>       - 6379:6379
>>     *healthcheck*:
>>       *test*: ["CMD", "redis-cli", "ping"]
>>       *interval*: 5s
>>       *timeout*: 30s
>>       *retries*: 50
>>     *restart*: always
>>
>>   *airflow-webserver*:
>>     <<: *airflow-common
>>     *command*: webserver
>>     *ports*:
>>       - 8080:8080
>>     *healthcheck*:
>>       *test*: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
>>       *interval*: 10s
>>       *timeout*: 10s
>>       *retries*: 5
>>     *restart*: always
>>
>>   *airflow-scheduler*:
>>     <<: *airflow-common
>>     *command*: scheduler
>>     *healthcheck*:
>>       *test*: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob
>> --hostname "$${HOSTNAME}"']
>>       *interval*: 10s
>>       *timeout*: 10s
>>       *retries*: 5
>>     *restart*: always
>>
>>   *airflow-worker*:
>>     <<: *airflow-common
>>     *command*: celery worker
>>     *healthcheck*:
>>       *test*:
>>         - "CMD-SHELL"
>>         - 'celery --app airflow.executors.celery_executor.app inspect
>> ping -d "celery@$${HOSTNAME}"'
>>       *interval*: 10s
>>       *timeout*: 10s
>>       *retries*: 5
>>     *restart*: always
>>
>>   *airflow-init*:
>>     <<: *airflow-common
>>     *command*: version
>>     *environment*:
>>       <<: *airflow-common-env
>>       *_AIRFLOW_DB_UPGRADE*: 'true'
>>       *_AIRFLOW_WWW_USER_CREATE*: 'true'
>>       *_AIRFLOW_WWW_USER_USERNAME*:
>> ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
>>       *_AIRFLOW_WWW_USER_PASSWORD*:
>> ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
>>
>>   *flower*:
>>     <<: *airflow-common
>>     *command*: celery flower
>>     *ports*:
>>       - 5555:5555
>>     *healthcheck*:
>>       *test*: ["CMD", "curl", "--fail", "http://localhost:5555/"]
>>       *interval*: 10s
>>       *timeout*: 10s
>>       *retries*: 5
>>     *restart*: always
>>
>> *volumes*:
>>   *postgres**-db-volume*:
>>
>>
>>
>>
>

Mime
View raw message