airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Wang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-855) Security - Airflow SQLAlchemy PickleType Allows for Code Execution
Date Thu, 09 Feb 2017 20:22:42 GMT
Rui Wang created AIRFLOW-855:
--------------------------------

             Summary: Security - Airflow SQLAlchemy PickleType Allows for Code Execution
                 Key: AIRFLOW-855
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-855
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Rui Wang
         Attachments: test_dag.txt

Impact: Anyone able to modify the application's underlying database, or a computer where certain
DAG tasks are executed, may execute arbitrary code on the Airflow host.
Location: The XCom class in /airflow-internal-master/airflow/models.py
Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to allow for a database
agnostic, object-oriented manipulation of application data. You express database tables and
values using Python (in this application's use) classes, and the ORM transparently manipulates
the underlying database, when you programatically access these structures.
Airflow defines the following class, defining an XCom's11 ORM model:
```
class XCom(Base): 
  """
  Base class for XCom objects. 
  """
  __tablename__ = "xcom"
  id = Column(Integer, primary_key=True) 
  key = Column(String(512))
  value = Column(PickleType(pickler=dill)) 
  timestamp = Column(
    DateTime, default=func.now(), nullable=False) 
  execution_date = Column(DateTime, nullable=False)
```
XComs are used for inter-task communication, and their values are either defined in a DAG,
or the return value of the python_callable() function or the task's execute() method, executed
on an remote host. XCom values are, according to this model, of the PickleType, meaning that
objects assigned to the value column are transparently serialized (when being written to)
and deserialized (when being read from). The deserialization of user- controlled pickle objects
allows for the execution of arbitrary code. This means that "slaves" (where DAG code is executed)
can compromise "masters" (where DAGs are defined in code) by returning an object that, when
serialized (and subsequently deserialized), causes remote code execution. This can also be
triggered by anyone who has write access to this portion of the database.
Note: NCC Group plans to meet with developers in the coming days to discuss this finding,
and it will be updated to reflect any additional insight provided by this meeting.
Reproduction Steps:
1. Configure a local instance of Airflow.
2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
This example models a slave returning a malicious object to a task's python_callable by creating
a portable object (with reduce) containing a reverse shell and pushing it as an XCom's value.
This value is serialized upon xcom_push and deserialized upon xcom_pull.
In an actual exploit scenario, this value would be DAG function's return value, as assigned
by code within the function, executing on a malicious remote machine.
3. Start a netcat listener on your machine's port 4444
4. Execute this task from the command line with airflow run push 2016-11-17. Note that your
netcat listener has received a shell connect-back.
Remediation: Consider the use of a custom SQLAlchemy data type that performs this transparent
serialization and deserialization, but with JSON (a text-based exchange format), rather than
pickles (which may contain code).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message