airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (AIRFLOW-855) Security - Airflow SQLAlchemy PickleType Allows for Code Execution
Date Sun, 02 Sep 2018 18:09:02 GMT


Apache Spark commented on AIRFLOW-855:

User 'amaliujia' has created a pull request for this issue:

> Security - Airflow SQLAlchemy PickleType Allows for Code Execution
> ------------------------------------------------------------------
>                 Key: AIRFLOW-855
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>         Attachments: test_dag.txt
> Impact: Anyone able to modify the application's underlying database, or a computer where
certain DAG tasks are executed, may execute arbitrary code on the Airflow host.
> Location: The XCom class in /airflow-internal-master/airflow/
> Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to allow for
a database agnostic, object-oriented manipulation of application data. You express database
tables and values using Python (in this application's use) classes, and the ORM transparently
manipulates the underlying database, when you programatically access these structures.
> Airflow defines the following class, defining an XCom's11 ORM model:
> {code}
> class XCom(Base): 
>   """
>   Base class for XCom objects. 
>   """
>   __tablename__ = "xcom"
>   id = Column(Integer, primary_key=True) 
>   key = Column(String(512))
>   value = Column(PickleType(pickler=dill)) 
>   timestamp = Column(
>     DateTime,, nullable=False) 
>   execution_date = Column(DateTime, nullable=False)
> {code}
> XComs are used for inter-task communication, and their values are either defined in a
DAG, or the return value of the python_callable() function or the task's execute() method,
executed on an remote host. XCom values are, according to this model, of the PickleType, meaning
that objects assigned to the value column are transparently serialized (when being written
to) and deserialized (when being read from). The deserialization of user- controlled pickle
objects allows for the execution of arbitrary code. This means that "slaves" (where DAG code
is executed) can compromise "masters" (where DAGs are defined in code) by returning an object
that, when serialized (and subsequently deserialized), causes remote code execution. This
can also be triggered by anyone who has write access to this portion of the database.
> Note: NCC Group plans to meet with developers in the coming days to discuss this finding,
and it will be updated to reflect any additional insight provided by this meeting.
> Reproduction Steps:
> 1. Configure a local instance of Airflow.
> 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
> This example models a slave returning a malicious object to a task's python_callable
by creating a portable object (with reduce) containing a reverse shell and pushing it as an
XCom's value. This value is serialized upon xcom_push and deserialized upon xcom_pull.
> In an actual exploit scenario, this value would be DAG function's return value, as assigned
by code within the function, executing on a malicious remote machine.
> 3. Start a netcat listener on your machine's port 4444
> 4. Execute this task from the command line with airflow run push 2016-11-17. Note that
your netcat listener has received a shell connect-back.
> Remediation: Consider the use of a custom SQLAlchemy data type that performs this transparent
serialization and deserialization, but with JSON (a text-based exchange format), rather than
pickles (which may contain code).

This message was sent by Atlassian JIRA

View raw message