airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Riccomini (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor
Date Thu, 30 Jun 2016 20:01:10 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Riccomini updated AIRFLOW-243:
------------------------------------
    Affects Version/s:     (was: Airflow 2.0)
                       Airflow 1.7.1.3

> Use a more efficient Thrift call for HivePartitionSensor
> --------------------------------------------------------
>
>                 Key: AIRFLOW-243
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-243
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Paul Yang
>            Assignee: Li Xuanji
>            Priority: Minor
>             Fix For: Airflow 1.8
>
>
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call that can
result in some expensive SQL queries for tables that have many partitions and are partitioned
by multiple keys. We've seen our metastore DB get hammered by these sensors resulting in service
degradation for other metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many connections
to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that translates into
more efficient SQL queries. Because connections will be pooled on the Thrift server, the DB
won't get overloaded as with the {{MetastorePartitionSensor}}. The semantics of the arguments
will change, so either a new argument needs to be introduced, or a new operator needs to be
created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message