airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [airflow] mik-laj commented on a change in pull request #6983: [AIRFLOW-6414] configuration docs
Date Thu, 02 Jan 2020 21:38:24 GMT
mik-laj commented on a change in pull request #6983: [AIRFLOW-6414] configuration docs
URL: https://github.com/apache/airflow/pull/6983#discussion_r362639560
 
 

 ##########
 File path: docs/howto/configurations-ref.rst
 ##########
 @@ -0,0 +1,1141 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Configuration Reference
+=======================
+
+.. _config-ref/core:
+
+``[core]``
+^^^^^^^^^^
+
+dags_folder
+***********
+The folder where your airflow pipelines live, most likely a subfolder in a code repository.
This path must be absolute
+
+hostname_callable
+*****************
+
+Hostname by providing a path to a callable, which will resolve the hostname. The format is
"package:function". For example, default value "socket:getfqdn" means that result from getfqdn()
of "socket" package will be used as hostname. No argument should be required in the function
specified. gIf using IP address as hostname is preferred, use value "airflow.utils.net:get_host_ip_address"
+
+default_timezone
+****************
+
+Default timezone in case supplied date times are naive. Can be utc (default), system, or
any IANA timezone string (e.g. Europe/Amsterdam)
+
+executor
+*********
+
+The executor class that airflow should use. Choices include SequentialExecutor, LocalExecutor,
CeleryExecutor, DaskExecutor, KubernetesExecutor
+
+
+sql_alchemy_conn
+****************
+
+The SqlAlchemy connection string to the metadata database. SqlAlchemy supports many different
database engine, more information their website
+
+sql_engine_encoding
+*******************
+
+The encoding for the databases
+
+sql_alchemy_pool_enabled
+************************
+
+If SqlAlchemy should pool database connections.
+
+sql_alchemy_pool_size
+*********************
+The SqlAlchemy pool size is the maximum number of database connections in the pool. 0 indicates
no limit.
+
+sql_alchemy_max_overflow
+************************
+The maximum overflow size of the pool.  When the number of checked-out connections reaches
the size set in pool_size, additional connections will be returned up to this limit.  When
those additional connections are returned to the pool, they are disconnected and discarded.
 It follows then that the total number of simultaneous connections the pool will allow is
pool_size + max_overflow, and the total number of "sleeping" connections the pool will allow
is pool_size.  max_overflow can be set to -1 to indicate no overflow limit; no limit will
be placed on the total number of concurrent connections. Defaults to 10.
+
+sql_alchemy_pool_recycle
+************************
+The SqlAlchemy pool recycle is the number of seconds a connection can be idle in the pool
before it is invalidated. This config does not apply to sqlite. If the number of DB connections
is ever exceeded, a lower config value will allow the system to recover faster.
+
+sql_alchemy_pool_pre_ping
+*************************
+Check connection at the start of each connection pool checkout.  Typically, this is a simple
statement like "SELECT 1".  More information here: https://docs.sqlalchemy.org/en/13/core/pooling.html#disconnect-handling-pessimistic
+sql_alchemy_schema
+******************
+The schema to use for the metadata database. SqlAlchemy supports databases with the concept
of multiple schemas.
+
+sql_alchemy_connect_args
+************************
+
+Import path for connect args in SqlAlchemy. Default to an empty dict.  This is useful when
you want to configure db engine args that SqlAlchemy won't parse in connection string.  See
https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.connect_args
+
+parallelism
+***********
+
+The amount of parallelism as a setting to the executor. This defines the max number of task
instances that should run simultaneously on this airflow installation
+
+dag_concurrency
+***************
+
+The number of task instances allowed to run concurrently by the scheduler
+
+dags_are_paused_at_creation
+***************************
+
+Are DAGs paused by default at creation
+
+max_active_runs_per_dag
+***********************
+
+The maximum number of active DAG runs per DAG
+
+load_examples
+*************
+
+Whether to load the examples that ship with Airflow. It's good to get started, but you probably
want to set this to False in a production environment
+
+plugins_folder
+******************
+
+Where your Airflow plugins are stored
+
+fernet_key
+**********
+
+Secret key to save connection passwords in the db
+
+donot_pickle
+************
+
+Whether to disable pickling dags
+
+dagbag_import_timeout
+*********************
+
+How long before timing out a python file import
+
+dag_file_processor_timeout
+**************************
+
+How long before timing out a DagFileProcessor, which processes a dag file
+
+task_runner
+***********
+
+The class to use for running task instances in a subprocess
+Can be used to de-elevate a sudo user running Airflow when executing tasks
+
+default_impersonation
+*********************
+
+If set, tasks without a ``run_as_user`` argument will be run with this user
+
+security
+********
+
+What security module to use (for example kerberos):
+
+secure_mode
+***********
+
+If set to False enables some unsecure features like Charts and Ad Hoc Queries.  In 2.0 will
default to True.
+
+unit_test_mode
+**************
+
+Turn unit test mode on (overwrites many configuration options with test values at runtime)
+
+enable_xcom_pickling
+********************
+
+Whether to enable pickling for xcom (note that this is insecure and allows for RCE exploits).
This will be deprecated in Airflow 2.0 (be forced to False).
+
+killed_task_cleanup_time
+************************
+
+When a task is killed forcefully, this is the amount of time in seconds that it has to cleanup
after it is sent a SIGTERM, before it is SIGKILLED
+
+dag_run_conf_overrides_params
+*****************************
+
+Whether to override params with dag_run.conf. If you pass some key-value pairs through ``airflow
dags backfill -c`` or ``airflow dags trigger -c``, the key-value pairs will override the existing
ones in params.
+
+worker_precheck
+***************
+
+Worker initialisation check to validate Metadata Database connection
+
+dag_discovery_safe_mode
+***********************
+
+When discovering DAGs, ignore any files that don't contain the strings ``DAG`` and ``airflow``.
+
+default_task_retries
+********************
+
+The number of retries each task is going to have by default. Can be overridden at dag or
task level.
+
+store_serialized_dags
+*********************
+
+Whether to serialises DAGs and persist them in DB.  If set to True, Webserver reads from
DB instead of parsing DAG files More details: https://airflow.apache.org/docs/stable/dag-serialization.html
+
+min_serialized_dag_update_interval
+**********************************
+
+Updating serialized DAG can not be faster than a minimum interval to reduce database write
rate.
+
+check_slas
+**********
+
+On each dagrun check against defined SLAs
+
+.. _config-ref/logging:
+
+[logging]
+^^^^^^^^^
+
+base_log_folder
+***************
+The folder where airflow should store its log files This path must be absolute
+
+remote_logging
+**************
+Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search. Users
must supply an Airflow connection id that provides access to the storage location. If remote_logging
is set to true, see UPDATING.md for additional configuration requirements.
+
+remote_log_conn_id
+******************
+
+remote_base_log_folder
+**********************
+
+encrypt_s3_logs
+***************
+
+logging_level
+*************
+
+fab_logging_level
+*****************
+
+Logging class
+*************
+
+Specify the class that will specify the logging configuration
+This class has to be on the python classpath
+
+logging_config_class
+********************
+
+Log format
+**********
+
+Colour the logs when the controlling terminal is a TTY.
+
+colored_console_log
+*******************
+
+colored_log_format
+******************
+
+colored_formatter_class
+***********************
+
+
+log_format
+**********
+
+simple_log_format
+*****************
+
+
+task_log_prefix_template
+************************
+
+Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
+
+
+log_filename_template
+*********************
+Log filename format
+
+log_processor_filename_template
+*******************************
+
+dag_processor_manager_log_location
+**********************************
+
+Name of handler to read task instance logs. Default to use task handler.
+
+task_log_reader
+***************
+
+cli
+***
+
+api_client
+**********
+
+endpoint_url
+************
+
+In what way should the cli access the API. The LocalClient will use the database directly,
while the json_client will use the api running on the webserver
+
+If you set web_server_url_prefix, do NOT forget to append it here, ex: endpoint_url. So api
will look like: http://localhost:8080/myroot/api/experimental/...
+
+.. _config-ref/debug:
+
+[debug]
+^^^^^^^
+
+fail_fast
+*********
+Used only with DebugExecutor. If set to True DAG will fail with first failed task. Helpful
for debugging purposes.
+
+.. _config-ref/api:
+
+[api]
+^^^^^
+auth_backend
+************
+How to authenticate users of the API
+
+lineage
+
+backend
+*******
+
+what lineage backend to use
+
+.. _config-ref/atlas:
+
+[atlas]
+^^^^^^^
+sasl_enabled
+************
+host
+****
+port
+****
+username
+********
+password
+********
+
+.. _config-ref/operators:
+
+[operators]
+^^^^^^^^^^^
+
+default_owner
+*************
+The default owner assigned to each new operator, unless provided explicitly or passed via
``default_args``
+
+default_cpus
+************
+default_ram
+***********
+default_disk
+************
+default_gpus
+************
+
+allow_illegal_arguments
+***********************
+Is allowed to pass additional/unused arguments (args, kwargs) to the BaseOperator operator.
If set to False, an exception will be thrown, otherwise only the console message will be displayed.
+
+.. _config-ref/hive:
+
+[hive]
+^^^^^^
+
+default_hive_mapred_queue
+*************************
+Default mapreduce queue for HiveOperator tasks
+mapred_job_name_template
+************************
+Template for mapred_job_name in HiveOperator, supports the following named parameters: hostname,
dag_id, task_id, execution_date
+
+.. _config-ref/webserver:
+
+[webserver]
+^^^^^^^^^^^
+
+base_url
+********
+The base url of your website as airflow cannot guess what domain or cname you are using.
This is used in automated emails that airflow sends to point links to the right web server
+web_server_host
+***************
+The ip specified when starting the web server
+
+web_server_port
+***************
+
+The port on which to run the web server
+
+web_server_ssl_cert
+*******************
+Paths to the SSL certificate and key for the web server. When both are provided SSL will
be enabled. This does not change the web server port.
+
+web_server_ssl_key
+******************
+
+web_server_master_timeout
+*************************
+Number of seconds the webserver waits before killing gunicorn master that doesn't respond
+
+web_server_worker_timeout
+*************************
+
+Number of seconds the gunicorn webserver waits before timing out on a worker
+worker_refresh_batch_size
+*************************
+Number of workers to refresh at a time. When set to 0, worker refresh is disabled. When nonzero,
airflow periodically refreshes webserver workers by bringing up new ones and killing old ones.
+
+worker_refresh_interval
+***********************
+
+Number of seconds to wait before refreshing a batch of workers.
+secret_key
 
 Review comment:
   ```suggestion
   
   secret_key
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message