airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hongbo Zeng (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator
Date Sat, 25 Jun 2016 16:28:37 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hongbo Zeng closed AIRFLOW-118.
-------------------------------
    Assignee: Hongbo Zeng

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-118
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-118
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Hongbo Zeng
>            Assignee: Hongbo Zeng
>
> The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users
need to tune the numbers repeatedly, and do that again when the data size changes. This is
not scalable as the number of data sources grows. targetPartitionSize approach calculates
the number of segments automatically and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message