airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edgar Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1495) TaskInstanceModelView using unindexed column for order_by on default
Date Tue, 08 Aug 2017 18:56:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edgar Rodriguez updated AIRFLOW-1495:
-------------------------------------
    Description: 
Using unindexed columns for ordering query results triggers sorting in the database, which
makes for slower queries.
Solution is to use an indexed column or add an index on the desired columns if it doesn't
exist, see: https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489

When checking for indices on task_instance, none is found for job_id:
{code}
mysql> show index from task_instance;
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name     | Seq_in_index | Column_name     | Collation |
Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| task_instance |          0 | PRIMARY      |            1 | task_id         | A         |
         71 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          0 | PRIMARY      |            2 | dag_id          | A         |
         80 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          0 | PRIMARY      |            3 | execution_date  | A         |
      30056 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_dag_state |            1 | dag_id          | A         |
         18 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_dag_state |            2 | state           | A         |
         28 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            1 | pool            | A         |
          1 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            2 | state           | A         |
          3 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            3 | priority_weight | A         |
         16 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            1 | dag_id          | A         |
         18 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            2 | task_id         | A         |
         82 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            3 | execution_date  | A         |
      28037 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            4 | state           | A         |
      29354 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_state     |            1 | state           | A         |
          3 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
{code}

So, when executing a query similar to the one for the view paging, the query explain shows
the required filesort:
{code}
mysql> explain select * from task_instance order by job_id limit 500 offset 500;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table         | partitions | type | possible_keys | key  | key_len |
ref  | rows  | filtered | Extra          |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL | NULL    |
NULL | 30119 |   100.00 | Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
{code}

  was:
Using unindexed columns for query results order triggers sorting in the database, which makes
for slower queries.
Solution is to use an indexed column or add an index on the desired columns if it doesn't
exist, see: https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489

When checking for indices on task_instance, none is found for job_id:
{code}
mysql> show index from task_instance;
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name     | Seq_in_index | Column_name     | Collation |
Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| task_instance |          0 | PRIMARY      |            1 | task_id         | A         |
         71 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          0 | PRIMARY      |            2 | dag_id          | A         |
         80 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          0 | PRIMARY      |            3 | execution_date  | A         |
      30056 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_dag_state |            1 | dag_id          | A         |
         18 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_dag_state |            2 | state           | A         |
         28 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            1 | pool            | A         |
          1 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            2 | state           | A         |
          3 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_pool      |            3 | priority_weight | A         |
         16 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            1 | dag_id          | A         |
         18 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            2 | task_id         | A         |
         82 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            3 | execution_date  | A         |
      28037 |     NULL | NULL   |      | BTREE      |         |               |
| task_instance |          1 | ti_state_lkp |            4 | state           | A         |
      29354 |     NULL | NULL   | YES  | BTREE      |         |               |
| task_instance |          1 | ti_state     |            1 | state           | A         |
          3 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
{code}

So, when executing a query similar to the one for the view paging, the query explain shows
the required filesort:
{code}
mysql> explain select * from task_instance order by job_id limit 500 offset 500;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table         | partitions | type | possible_keys | key  | key_len |
ref  | rows  | filtered | Extra          |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL | NULL    |
NULL | 30119 |   100.00 | Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
{code}


> TaskInstanceModelView using unindexed column for order_by on default
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-1495
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1495
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: webserver
>            Reporter: Edgar Rodriguez
>
> Using unindexed columns for ordering query results triggers sorting in the database,
which makes for slower queries.
> Solution is to use an indexed column or add an index on the desired columns if it doesn't
exist, see: https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489
> When checking for indices on task_instance, none is found for job_id:
> {code}
> mysql> show index from task_instance;
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | Table         | Non_unique | Key_name     | Seq_in_index | Column_name     | Collation
| Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | task_instance |          0 | PRIMARY      |            1 | task_id         | A    
    |          71 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          0 | PRIMARY      |            2 | dag_id          | A    
    |          80 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          0 | PRIMARY      |            3 | execution_date  | A    
    |       30056 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          1 | ti_dag_state |            1 | dag_id          | A    
    |          18 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          1 | ti_dag_state |            2 | state           | A    
    |          28 |     NULL | NULL   | YES  | BTREE      |         |               |
> | task_instance |          1 | ti_pool      |            1 | pool            | A    
    |           1 |     NULL | NULL   | YES  | BTREE      |         |               |
> | task_instance |          1 | ti_pool      |            2 | state           | A    
    |           3 |     NULL | NULL   | YES  | BTREE      |         |               |
> | task_instance |          1 | ti_pool      |            3 | priority_weight | A    
    |          16 |     NULL | NULL   | YES  | BTREE      |         |               |
> | task_instance |          1 | ti_state_lkp |            1 | dag_id          | A    
    |          18 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          1 | ti_state_lkp |            2 | task_id         | A    
    |          82 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          1 | ti_state_lkp |            3 | execution_date  | A    
    |       28037 |     NULL | NULL   |      | BTREE      |         |               |
> | task_instance |          1 | ti_state_lkp |            4 | state           | A    
    |       29354 |     NULL | NULL   | YES  | BTREE      |         |               |
> | task_instance |          1 | ti_state     |            1 | state           | A    
    |           3 |     NULL | NULL   | YES  | BTREE      |         |               |
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> {code}
> So, when executing a query similar to the one for the view paging, the query explain
shows the required filesort:
> {code}
> mysql> explain select * from task_instance order by job_id limit 500 offset 500;
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> | id | select_type | table         | partitions | type | possible_keys | key  | key_len
| ref  | rows  | filtered | Extra          |
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> |  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL | NULL
   | NULL | 30119 |   100.00 | Using filesort |
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message