airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [airflow] efedotova commented on a change in pull request #6285: [AIRFLOW-XXX] Updates to Breeze documentation from GSOD
Date Sun, 13 Oct 2019 13:22:07 GMT
efedotova commented on a change in pull request #6285: [AIRFLOW-XXX] Updates to Breeze documentation
from GSOD
URL: https://github.com/apache/airflow/pull/6285#discussion_r334277841
 
 

 ##########
 File path: BREEZE.rst
 ##########
 @@ -46,409 +46,555 @@ Here is the short 10 minute video about Airflow Breeze
 Prerequisites
 =============
 
-Docker
-------
+Docker Community Edition
+-----------------------
 
-You need latest stable Docker Community Edition installed and on the PATH. It should be
-configured to be able to run ``docker`` commands directly and not only via root user. Your
user
-should be in the ``docker`` group. See `Docker installation guide <https://docs.docker.com/install/>`_
+- **Version**: Install the latest stable Docker Community Edition and add it to the PATH.
+- **Permissions**: Configure to run the ``docker`` commands directly and not only via root
user. Your user should be in the ``docker`` group. See `Docker installation guide <https://docs.docker.com/install/>`_
for details.
+- **Disk space**: On macOS, increase your available disk space before starting to work with
the environment. At least 128 GB of free disk space is recommended. You can also get by with
a smaller space but make sure to clean up the Docker disk space periodically. See also `Docker
for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for details on increasing
disk space available for Docker on Mac.
 
-When you develop on Mac OS you usually have not enough disk space for Docker if you start
using it
-seriously. You should increase disk space available before starting to work with the environment.
-Usually you have weird problems of docker containers when you run out of Disk space. It might
not be
-obvious that space is an issue. At least 128 GB of Disk space is recommended. You can also
get by with smaller space but you should more
-often clean the docker disk space periodically.
+  Sometimes it is not obvious that space is an issue when you run into a problem with Docker.
If you see a weird behaviour, try `cleaning up the images <#cleaning-up-the-images>`_.
 
-If you get into weird behaviour try `Cleaning up the images <#cleaning-up-the-images>`_.
+Docker Compose
+--------------
 
-See also `Docker for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for
details of increasing
-disk space available for Docker on Mac.
+- **Version**: Install the latest stable Docker Compose and add it to the PATH. See `Docker
Compose Installation Guide <https://docs.docker.com/compose/install/>`_ for details.
 
-Docker compose
---------------
+- **Permissions**: Configure to run the ``docker-compose`` command.
+
+Docker Images Used by Breeze
+----------------------------
 
-Latest stable Docker Compose installed and on the PATH. It should be
-configured to be able to run ``docker-compose`` command.
-See `Docker compose installation guide <https://docs.docker.com/compose/install/>`_
+For all development tasks, related integration tests and static code checks, we use Docker
+images maintained on the Docker Hub in the ``apache/airflow`` repository.
 
-Getopt and gstat
-----------------
+There are three images that we are currently managing:
+
+* **Slim CI** image that is used for static code checks (size of ~500MB). Its tag follows
the pattern
+  of ``<BRANCH>-python<PYTHON_VERSION>-ci-slim`` (for example, ``apache/airflow:master-python3.6-ci-slim``).
+  The image is built using the `<Dockerfile>`_ Dockerfile.
+* **Full CI image*** that is used for testing. It contains a lot more test-related installed
software
+  (size of ~1GB). Its tag follows the pattern of ``<BRANCH>-python<PYTHON_VERSION>-ci``
+  (for example, ``apache/airflow:master-python3.6-ci``). The image is built using the
+  `<Dockerfile>`_ Dockerfile.
+* **Checklicence image** that is used during license check with the Apache RAT tool. It does
not
+  require any of the dependencies that the two CI images need so it is built using a different
Dockerfile
+  `<Dockerfile-checklicence>`_ and only contains Java + Apache RAT tool. The image
is
+  labelled with ``checklicence``, for example: ``apache/airflow:checklicence``. No versioning
is used for
+  the Checklicence image.
+
+We also use a very small `<Dockerfile-context>`_ Dockerfile to fix file permissions
+for an obscure permission problem with Docker caching but it is not stored in the ``apache/airflow``
registry.
 
-* If you are on MacOS
+Before you run tests, enter the environment or run local static checks, the necessary local
images should be
+pulled and built from Docker Hub. This happens automatically for the test environment but
you need to
+manually trigger it for static checks as described in `Building the images <#bulding-the-images>`_
+and `Pulling the latest images <#pulling-the-latest-images>`_.
+The static checks will fail and inform what to do if the image is not yet built.
+
+Building the image first time pulls a pre-built version of images from the Docker Hub, which
may take some
+time. But for subsequent source code changes, no wait time is expected.
+However, changes to sensitive files like setup.py or Dockerfile will trigger a rebuild
+that may take more time though it is highly optimized to only rebuild what is needed.
 
-  * you need gnu ``getopt`` and ``gstat`` to get Airflow Breeze running.
+In most cases, rebuilding an image requires network connectivity (for example, to download
new
+dependencies). If you work offline and do not want to rebuild the images when needed, you
can set the 
+``FORCE_ANSWER_TO_QUESTIONS`` variable to ``no`` as described in the
+`Default behaviour for user interaction <#default-behaviour-for-user-interaction>`_
section.
 
-  * Typically you need to run ``brew install gnu-getopt coreutils`` and then follow instructions
(you need to link the gnu getopt
-    version to become first on the PATH). Make sure to re-login after yoy make the suggested
changes.
+See `Troubleshooting section <#troubleshooting>`_ for steps you can make to clean the
environment.
 
-  * Then (with brew) link the gnu-getopt to become default as suggested by brew.
+Getopt and gstat
+----------------
 
-  * If you use bash, you should run this command (and re-login):
+* For macOS, install GNU ``getopt`` and ``gstat`` utilities to get Airflow Breeze running.
 
-  .. code-block:: bash
+  Run ``brew install gnu-getopt coreutils`` and then follow instructions to link the gnu-getopt
version to become the first on the PATH. Make sure to re-login after you make the suggested
changes.
 
-      echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile
-      . ~/.bash_profile
+  If you use bash, run this command and re-login:
 
-  * If you use zsh, you should run this command (and re-login):
+.. code-block:: bash
 
-  .. code-block:: bash
+    echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile
+    . ~/.bash_profile
 
-      echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile
-      . ~/.zprofile
+..
 
-* If you are on Linux
+  If you use zsh, run this command and re-login:
 
-   * run ``apt install util-linux coreutils`` or equivalent if your system is not Debian-based.
+.. code-block:: bash
+
+    echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile
+    . ~/.zprofile
+
+* For Linux, run ``apt install util-linux coreutils`` or an equivalent if your system is
not Debian-based.
 
 Memory
 ------
 
-Minimum 4GB RAM is required to run the full ``docker`` environment.
+Minimum 4GB RAM is required to run the full Breeze environment.
+
+On macOS, 2GB of RAM are available for your Docker containers by default, but more memory
is recommended
+(4GB should be comfortable). For details see `Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
+
+Airflow Directory Structure inside Docker
+-----------------------------------------
+
+When you are in the container, the following directories are used:
 
-On MacOS, the default 2GB of RAM available for your docker containers, but more memory is
recommended
-(4GB should be comfortable). For details see
-`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_
+.. code-block:: text
+
+  /opt/airflow - Contains sources of Airflow mounted from the host (AIRFLOW_SOURCES).
+  /root/airflow - Contains all the "dynamic" Airflow files (AIRFLOW_HOME), such as:
+      airflow.db - sqlite database in case sqlite is used;
+      dags - folder with non-test dags (test dags are in /opt/airflow/tests/dags);
+      logs - logs from Airflow executions;
+      unittest.cfg - unit test configuration generated when entering the environment;
+      webserver_config.py - webserver configuration generated when running Airflow in the
container.
+
+Note that when running in your local environment, the ``/root/airflow/logs`` folder is actually
mounted from your
+``logs`` directory in the Airflow sources, so all logs created in the container are automatically
visible in the host
+as well. Every time you enter the container, the ``logs`` directory is cleaned so that logs
do not accumulate.
 
-How Breeze works
-================
 
+Using the Airflow Breeze Environment
+=====================================
+
+Airflow Breeze is a bash script serving as a "swiss-army-knife" of Airflow testing. Under
the
+hood it uses other scripts that you can also run manually if you have problem with running
the Breeze
+environment.
+
+Breeze script allows performing the following tasks:
+
+* Enter an interactive environment when no command flags are specified (default behaviour).
+* Stop the interactive environment with ``-k``, ``--stop-environment`` command.
+* Build a Docker image with ``-b``, ``--build-only`` command.
+* Set up autocomplete for itself with ``-a``, ``--setup-autocomplete`` command.
+* Build documentation with ``-O``, ``--build-docs`` command.
+* Run static checks either for currently staged change or for all files with ``-S``, ``--static-check``
or ``-F``, ``--static-check-all-files`` commands.
+* Set up local virtualenv with ``-e``, ``--setup-virtualenv`` command.
+* Run a test target specified with ``-t``, ``--test-target`` command.
+* Execute an arbitrary command in the test environment with ``-x``, ``--execute-command``
command.
+* Execute an arbitrary docker-compose command with ``-d``, ``--docker-compose`` command.
+          
 Entering Breeze
 ---------------
 
-Your entry point for Airflow Breeze is `./breeze <./breeze>`_ script. You can run it
with ``--help``
+You enter the Breeze integration test environment by running the ``./breeze`` script. You
can run it with the ``--help``
 option to see the list of available flags. See `Airflow Breeze flags <#airflow-breeze-flags>`_
for details.
 
-You can also `Set up autocomplete <#setting-up-autocomplete>`_ for the command and
add the
-checked-out airflow repository to your PATH to run breeze without the ./ and from any directory.
+  .. code-block:: bash
 
-First time you run Breeze, it will pull and build local version of docker images.
-It will pull latest Airflow CI images from `Airflow DockerHub <https://hub.docker.com/r/apache/airflow>`_
-and use them to build your local docker images.
+   ./breeze
+
+First time you run Breeze, it pulls and builds a local version of Docker images.
+It pulls the latest Airflow CI images from `Airflow DockerHub <https://hub.docker.com/r/apache/airflow>`_
+and use them to build your local Docker images. Note that the first run (per python) might
take up to 10 minutes
+on a fast connection to start. Subsequent runs should be much faster.
+
+Once you enter the environment, you are dropped into bash shell of the Airflow container
and you can run tests immediately.
+
+You can `set up autocomplete <#setting-up-autocomplete>`_ for commands and add the
+checked-out Airflow repository to your PATH to run Breeze without the ./ and from any directory.

 
 Stopping Breeze
 ---------------
 
 After starting up, the environment runs in the background and takes precious memory.
 You can always stop it via:
 
-.. code-block:: bash
+  .. code-block:: bash
 
     ./breeze --stop-environment
 
+Choosing a Breeze Environment
+-----------------------------
 
-Using the Airflow Breeze environment for testing
-================================================
+You can use additional ``breeze`` flags to customize your environment. For example, you can
specify a Python version to use, backend and a container environment for testing. With Breeze,
you can recreate the same environments as we have in matrix builds in Travis CI.
 
-Setting up autocomplete
------------------------
+For example, you can choose to run Python 3.6 tests with mysql as backend and in the Docker
environment as follows:
 
-The ``breeze`` command comes with built-in bash/zsh autocomplete for its flags. When you
start typing
-the command you can use <TAB> to show all the available switches
-nd to get autocompletion on typical values of parameters that you can use.
+  .. code-block:: bash
 
-You can setup auto-complete automatically by running:
+    ./breeze --python 3.6 --backend mysql --env docker
 
-.. code-block:: bash
+The choices you make are persisted in the ``./.build/`` cache directory so that next time
when you use the
+``breeze`` script, it could use the values that were used previously. This way you do not
have to specify them when you run the script. You can delete the ``.build/`` directory in
case you want to restore the default settings.
 
-   ./breeze --setup-autocomplete
-
-You get autocomplete working when you re-enter the shell.
+The defaults when you run the Breeze environment are Python 3.6, sqlite, and docker.
 
-Zsh autocompletion is currently limited to only autocomplete flags. Bash autocompletion also
completes
-flag values (for example python version or static check name).
+Available Docker Environments
+..............................
 
-Entering the environment
-------------------------
+You can choose a container environment when you run Breeze with ``--env`` flag.
+Running the default ``docker`` environment takes a considerable amount of resources. You
can run a slimmed-down
+version of the environment - just the Apache Airflow container - by choosing ``bare`` environment
instead.
 
-You enter the integration test environment by running the ``./breeze`` script.
+The following environments are available:
 
-What happens next is the appropriate docker images are pulled, local sources are used to
build local version
-of the image and you are dropped into bash shell of the airflow container -
-with all necessary dependencies started up. Note that the first run (per python) might take
up to 10 minutes
-on a fast connection to start. Subsequent runs should be much faster.
+ * The ``docker`` environment (default): starts all dependencies required by a full integration
test suite
+   (Postgres, Mysql, Celery, etc). This option is resource intensive so do not forget to
+   [stop environment](#stopping-the-environment) when you are finished. This option is also
RAM intensive
+   and can slow down your machine.
+ * The ``kubernetes`` environment: Runs Airflow tests within a kubernetes cluster.
+ * The ``bare`` environment:  runs Airflow in the Docker without any external dependencies.
+   It only works for independent tests. You can only run it with the sqlite backend.
 
-.. code-block:: bash
 
-   ./breeze
+Cleaning Up the Environment
+---------------------------
 
-Once you enter the environment you are dropped into bash shell and you can run tests immediately.
+You may need to clean up your Docker environment occasionally. The images are quite big
+(1.5GB for both images needed for static code analysis and CI tests) and, if you often rebuild/update
+them, you may end up with some unused image data.
 
-Choosing environment
---------------------
+To clean up the Docker environment:
 
-You can choose the optional flags you need with ``breeze``
+1. `Stop Breeze <#stopping-breeze>`_ with ``./breeze --stop-environment``.
 
-You can specify for example python version to use, backend to use and environment
-for testing - you can recreate the same environments as we have in matrix builds in Travis
CI.
+2. Run the ``docker system prune`` command.
 
-For example you could choose to run python 3.6 tests with mysql as backend and in docker
-environment by:
+3. Run ``docker images --all`` and ``docker ps --all`` to verify that your Docker is clean.
 
-.. code-block:: bash
+   Both commands should return an empty list of images and containers respectively.
 
-   ./breeze --python 3.6 --backend mysql --env docker
+If you run into disk space errors, consider pruning your Docker images with the ``docker
system prune --all`` command. You may need to restart the Docker Engine before running this
command.
 
-The choices you made are persisted in ``./.build/`` cache directory so that next time when
you use the
-``breeze`` script, it will use the values that were used previously. This way you do not
-have to specify them when you run the script. You can delete the ``.build/`` directory in
case you want to
-restore default settings.
+In case of disk space errors on macOS, increase the disk space available for Docker. See
`Prerequsites <#prerequisites>`_ for details.
 
-The defaults when you run the environment are reasonable (python 3.6, sqlite, docker).
+Building the Images
+-------------------
 
-Mounting local sources to Breeze
---------------------------------
+You can manually trigger building the local images using the script:
 
-Important sources of airflow are mounted inside the ``airflow-testing`` container that you
enter,
-which means that you can continue editing your changes in the host in your favourite IDE
and have them
-visible in docker immediately and ready to test without rebuilding images. This can be disabled
by specifying
-``--skip-mounting-source-volume`` flag when running breeze, in which case you will have sources
-embedded in the container - and changes to those sources will not be persistent.
+.. code-block::
 
+  ./scripts/ci/local_ci_build.sh
 
-After you run Breeze for the first time you will have an empty directory ``files`` in your
source code
-that will be mapped to ``/files`` in your docker container. You can pass any files there
you need
-to configure and run docker and they will not be removed between docker runs.
+The scripts that build the images are optimized to minimize the time needed to rebuild the
image when
+the source code of Airflow evolves. This means that if you already have the image locally
downloaded and built,
+the scripts will determine whether the rebuild is needed in the first place. Then the scripts
will make sure that minimal
+number of steps are executed to rebuild parts of the image (for example, PIP dependencies)
and will give
+you an image consistent with the one used during Continuous Integration.
 
-Running tests in Airflow Breeze
--------------------------------
+Pulling the Latest Images
 
 Review comment:
   I merged sections on pulling the latest images (describes a breeze flag) with the section
on force-pulling the images (describes a script). Please check whether I did that correctly

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message