airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Allman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-601) Airflow's Hive integration doesn't scale up to tables with more than 32,767 partitions (and this is really easy to fix)
Date Thu, 27 Oct 2016 03:31:58 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Allman updated AIRFLOW-601:
-----------------------------------
    Summary: Airflow's Hive integration doesn't scale up to tables with more than 32,767 partitions
(and this is really easy to fix)  (was: Airflow's Hive integration doesn't scale up to tables
with more than 32,767 partitions)

> Airflow's Hive integration doesn't scale up to tables with more than 32,767 partitions
(and this is really easy to fix)
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-601
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-601
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hive_hooks
>            Reporter: Michael Allman
>
> The Hive metastore API has a rather confusing method signature for {{listPartitions}}.
The last method parameter specifies the maximum number of partitions to return, and its type
is a Java short. So Airflow passes the maximum Java short value (32,767) and notes the limitation
in its API docs:
> https://github.com/apache/incubator-airflow/blob/92064398c4c982a310925da376745a1713bf96e2/airflow/hooks/hive_hooks.py#L497-L499
>  *However*, if you pass the magic number -1 as the "limit", then the metastore API will
return *all* partitions. I found this documented here:
> https://issues.cloudera.org/browse/IMPALA-749
> I've also tried this myself on a Hive table with 80,000+ partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message