airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor
Date Wed, 29 Jun 2016 22:48:48 GMT


ASF subversion and git services commented on AIRFLOW-243:

Commit bf28de4e601c165020669fd593964187b6246131 in incubator-airflow's branch refs/heads/master
from [~xuanji]
[;h=bf28de4 ]

[AIRFLOW-243] Create NamedHivePartitionSensor

Closes #1593 from zodiac/create-NamedHivePartitionSensor

> Use a more efficient Thrift call for HivePartitionSensor
> --------------------------------------------------------
>                 Key: AIRFLOW-243
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 2.0
>            Reporter: Paul Yang
>            Assignee: Li Xuanji
>            Priority: Minor
>             Fix For: Airflow 2.0
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call that can
result in some expensive SQL queries for tables that have many partitions and are partitioned
by multiple keys. We've seen our metastore DB get hammered by these sensors resulting in service
degradation for other metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many connections
to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that translates into
more efficient SQL queries. Because connections will be pooled on the Thrift server, the DB
won't get overloaded as with the {{MetastorePartitionSensor}}. The semantics of the arguments
will change, so either a new argument needs to be introduced, or a new operator needs to be

This message was sent by Atlassian JIRA

View raw message