airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Riccomini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator
Date Mon, 16 May 2016 20:18:12 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285213#comment-15285213
] 

Chris Riccomini commented on AIRFLOW-118:
-----------------------------------------

Was there a PR for this?

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-118
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-118
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Hongbo Zeng
>
> The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users
need to tune the numbers repeatedly, and do that again when the data size changes. This is
not scalable as the number of data sources grows. targetPartitionSize approach calculates
the number of segments automatically and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message