airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Yang (JIRA)" <>
Subject [jira] [Created] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor
Date Tue, 14 Jun 2016 22:53:29 GMT
Paul Yang created AIRFLOW-243:

             Summary: Use a more efficient Thrift call for HivePartitionSensor
                 Key: AIRFLOW-243
             Project: Apache Airflow
          Issue Type: Improvement
          Components: operators
    Affects Versions: Airflow 2.0
            Reporter: Paul Yang
            Assignee: Li Xuanji
            Priority: Minor
             Fix For: Airflow 2.0

The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call that can result
in some expensive SQL queries for tables that have many partitions and are partitioned by
multiple keys. We've seen our metastore DB get hammered by these sensors resulting in service
degradation for other metastore users.

The {{MetastorePartitionSensor}} is efficient, but it can result in too many connections to
the metastore DB.

An alternative is to use the `get_partition_by_name` Thrift call that translates into more
efficient SQL queries. Because connections will be pooled on the Thrift server, the DB won't
get overloaded as with the {{MetastorePartitionSensor}}. The semantics of the arguments will
change, so either a new argument needs to be introduced, or a new operator needs to be created.

This message was sent by Atlassian JIRA

View raw message