airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <ja...@potiuk.com>
Subject Re: Accessing external DAGs from Docker Container
Date Mon, 09 Aug 2021 14:42:21 GMT
Helm chart link here:
https://airflow.apache.org/docs/helm-chart/stable/index.html

pon., 9 sie 2021, 16:40 użytkownik Jarek Potiuk <jarek@potiuk.com> napisał:

> You do not need to set the AIRFLOW_HOME if you use the official image. See
> https://airflow.apache.org/docs/docker-stack/index.html for details.
>
> I am guessing, but do you happen to use remote docker engine rather than
> locally running one ? If so - mounting local volumes (bind mount) only
> works if you have local engine on the same host as the volumes or when you
> take care manually about mapping the same folders to all the host machines
> that your container engine can run containers on
> https://docs.docker.com/storage/bind-mounts/
>
> The local mount we have in our quick-start compose is only really useful
> for that - quick-start - and maybe for some very, very simple use cases
> when you are ok with running whole airflow on single machine (but that kind
> of defeats the purpose of scalability part.of airflow)
>
> If you want to continue using docker-compose you should use much more
> sophisticated mechanisms - for example other way of using volumes
> https://docs.docker.com/storage/volumes/ but this is usually not as easy
> as mounting local folders - you need to create and manage the volumes and
> add some backing store for them if you want to share data between multiple
> running containers. Or you can prepare custom image where you bake the dags
> in (but it limits the 'flexibility' as you have to rebuild the images every
> time you change the dags.
>
> Due to complexity and variability of potential use cases we do not yet
> have support in the community for (many variants of) docker-compose
> suitable for production use - do if you want to go that route, there will
> be limited support from the community, there are multiple ways you can
> share volumes and configure them so it's hard to support or even suggest
> any single way.
>
> However  If you want to make more 'serious' deployment of Airflow, I'd
> heartily recommend going Kubernetes and Airflow-community supported
> Official Helm Chart instead. It has all the different options of Dag
> sharing (including the best IMHO - Git-sync). It has all community support
> and is currently the state-of-the-art of deploying Airflow. And it has a
> lot of other useful features which are easy to use and configure.
>
> J.
>
> pon., 9 sie 2021, 15:47 użytkownik Anthony Joyce <
> anthony.joyce@omicronmedia.com> napisał:
>
>> Hi Franco,
>>
>> Adding the Airflow home environmental variable didn’t seem to help,
>> although your suggestion makes sense. I am not completely sure why it is
>> not being picked up. See below. (I put in the absolute path with and
>> without single quotes to test it out.)
>>
>> *version*: '3'
>> *x-airflow-common*:
>>   &airflow-common
>>   *image*: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
>>   *environment*:
>>     &airflow-common-env
>>     *AIRFLOW__CORE__EXECUTOR*: CeleryExecutor
>>     *AIRFLOW__CORE__SQL_ALCHEMY_CONN*: postgresql+psycopg2://airflow@
>> {host}/airflow
>>     *AIRFLOW__CELERY__RESULT_BACKEND*: db+postgresql://airflow@
>> {host}/airflow
>>     *AIRFLOW__CELERY__BROKER_URL*: redis://:@redis:6379/0
>>     *AIRFLOW__CORE__FERNET_KEY*: {key}
>>     *AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION*: 'true'
>>     *AIRFLOW__CORE__LOAD_EXAMPLES*: 'false'
>>     *AIRFLOW__CORE__DAGS_FOLDER*: ‘{HOME}/airflow/dags'
>>     *AIRFLOW__API__AUTH_BACKEND*: 'airflow.api.auth.backend.basic_auth'
>>     *_PIP_ADDITIONAL_REQUIREMENTS*: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
>>   *volumes*:
>>     - ./dags:/opt/airflow/dags
>>     - ./logs:/opt/airflow/logs
>>
>> Perhaps it has something to do with these WARNINGs I have been receiving
>> from flower?
>>
>> flower_1             | [2021-08-09 13:28:37,441] {mixins.py:229} INFO -
>> Connected to redis://redis:6379/0
>> flower_1             | [2021-08-09 13:28:38,654] {inspector.py:42}
>> WARNING - Inspect method stats failed
>> flower_1             | [2021-08-09 13:28:38,655] {inspector.py:42}
>> WARNING - Inspect method conf failed
>> flower_1             | [2021-08-09 13:28:38,663] {inspector.py:42}
>> WARNING - Inspect method active_queues failed
>> flower_1             | [2021-08-09 13:28:38,668] {inspector.py:42}
>> WARNING - Inspect method scheduled failed
>> flower_1             | [2021-08-09 13:28:38,669] {inspector.py:42}
>> WARNING - Inspect method revoked failed
>> flower_1             | [2021-08-09 13:28:38,670] {inspector.py:42}
>> WARNING - Inspect method active failed
>> flower_1             | [2021-08-09 13:28:38,671] {inspector.py:42}
>> WARNING - Inspect method reserved failed
>> flower_1             | [2021-08-09 13:28:38,672] {inspector.py:42}
>> WARNING - Inspect method registered failed
>>
>>
>> Any other suggestions welcome.
>>
>> Thanks,
>>
>> Anthony
>>
>>
>> On Aug 9, 2021, at 8:47 AM, Franco Peschiera <franco.peschiera@gmail.com>
>> wrote:
>>
>> You don't often get email from franco.peschiera@gmail.com. Learn why
>> this is important <http://aka.ms/LearnAboutSenderIdentification>
>> I don't see you have configured the AIRFLOW_HOME environment variable.
>> Try setting it to the absolute path so that it can find your dags. Also,
>> there is an environment variable where you can manually pass the path to
>> the DAGs folder. AIRFLOW__CORE__DAGS_FOLDER
>>
>> On Mon, Aug 9, 2021, 14:39 Anthony Joyce <anthony.joyce@omicronmedia.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> This is my first time the email list so I thank you in advance for the
>>> help.
>>>
>>> Here is my situation:
>>>
>>> We are running Airflow 1.10.10..a pretty old version at this point
>>> without a container and built from pip on CentOS. Instead of updating
>>> anaconda and dealing with dependency hell, I decided to download the
>>> official apache/airflow docker container and try to configure it to my
>>> already existing meta database and DAGs. It seems the container initialized
>>> successfully picking up our Variables and Connections via our existing
>>> Postgres-13 meta database with all containers healthy at this point.
>>> However, I am having a problem connecting our external Airflow DAGs folder
>>> (~/airflow/dags). I have copied the information in our ~/airflow/dags
>>> folder in the ./dags folder but that doesn’t seem to help.
>>>
>>> Do you all have any advice/suggestions regarding this issue?
>>>
>>> Here is the redacted docker-compose.yaml file below.
>>>
>>> Thank you for the help!
>>>
>>> Best,
>>>
>>> Anthony
>>>
>>> *version*: '3'
>>> *x-airflow-common*:
>>>   &airflow-common
>>>   *image*: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
>>>   *environment*:
>>>     &airflow-common-env
>>>     *AIRFLOW__CORE__EXECUTOR*: CeleryExecutor
>>>     *AIRFLOW__CORE__SQL_ALCHEMY_CONN*: postgresql+psycopg2://airflow@
>>> {host}/airflow
>>>     *AIRFLOW__CELERY__RESULT_BACKEND*: db+postgresql://airflow@
>>> {host}/airflow
>>>     *AIRFLOW__CELERY__BROKER_URL*: redis://:@redis:6379/0
>>>     *AIRFLOW__CORE__FERNET_KEY*: {key}
>>>     *AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION*: 'true'
>>>     *AIRFLOW__CORE__LOAD_EXAMPLES*: 'false'
>>>     *AIRFLOW__API__AUTH_BACKEND*: 'airflow.api.auth.backend.basic_auth'
>>>     *_PIP_ADDITIONAL_REQUIREMENTS*: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
>>>   *volumes*:
>>>     - ./dags:/opt/airflow/dags
>>>     - ./logs:/opt/airflow/logs
>>>     #- /home/etl/airflow/plugins/:/opt/airflow/plugins
>>>   *user*: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
>>>   *depends_on*:
>>>     *redis*:
>>>       *condition*: service_healthy
>>>     *postgres*:
>>>       *condition*: service_healthy
>>>
>>> *services*:
>>>   *postgres*:
>>>     *image*: postgres:13
>>>     *environment*:
>>>       *POSTGRES_USER*: airflow
>>>       *POSTGRES_PASSWORD*: airflow
>>>       *POSTGRES_DB*: airflow
>>>     *volumes*:
>>>       - postgres-db-volume:/localdata/pgdata
>>>     *healthcheck*:
>>>       *test*: ["CMD", "pg_isready", "-U", "airflow"]
>>>       *interval*: 5s
>>>       *retries*: 5
>>>     *restart*: always
>>>
>>>   *redis*:
>>>     *image*: redis:latest
>>>     *ports*:
>>>       - 6379:6379
>>>     *healthcheck*:
>>>       *test*: ["CMD", "redis-cli", "ping"]
>>>       *interval*: 5s
>>>       *timeout*: 30s
>>>       *retries*: 50
>>>     *restart*: always
>>>
>>>   *airflow-webserver*:
>>>     <<: *airflow-common
>>>     *command*: webserver
>>>     *ports*:
>>>       - 8080:8080
>>>     *healthcheck*:
>>>       *test*: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
>>>       *interval*: 10s
>>>       *timeout*: 10s
>>>       *retries*: 5
>>>     *restart*: always
>>>
>>>   *airflow-scheduler*:
>>>     <<: *airflow-common
>>>     *command*: scheduler
>>>     *healthcheck*:
>>>       *test*: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob
>>> --hostname "$${HOSTNAME}"']
>>>       *interval*: 10s
>>>       *timeout*: 10s
>>>       *retries*: 5
>>>     *restart*: always
>>>
>>>   *airflow-worker*:
>>>     <<: *airflow-common
>>>     *command*: celery worker
>>>     *healthcheck*:
>>>       *test*:
>>>         - "CMD-SHELL"
>>>         - 'celery --app airflow.executors.celery_executor.app inspect
>>> ping -d "celery@$${HOSTNAME}"'
>>>       *interval*: 10s
>>>       *timeout*: 10s
>>>       *retries*: 5
>>>     *restart*: always
>>>
>>>   *airflow-init*:
>>>     <<: *airflow-common
>>>     *command*: version
>>>     *environment*:
>>>       <<: *airflow-common-env
>>>       *_AIRFLOW_DB_UPGRADE*: 'true'
>>>       *_AIRFLOW_WWW_USER_CREATE*: 'true'
>>>       *_AIRFLOW_WWW_USER_USERNAME*:
>>> ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
>>>       *_AIRFLOW_WWW_USER_PASSWORD*:
>>> ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
>>>
>>>   *flower*:
>>>     <<: *airflow-common
>>>     *command*: celery flower
>>>     *ports*:
>>>       - 5555:5555
>>>     *healthcheck*:
>>>       *test*: ["CMD", "curl", "--fail", "http://localhost:5555/"]
>>>       *interval*: 10s
>>>       *timeout*: 10s
>>>       *retries*: 5
>>>     *restart*: always
>>>
>>> *volumes*:
>>>   *postgres**-db-volume*:
>>>
>>>
>>>
>>>
>>

Mime
View raw message