airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hongbo Zeng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator
Date Sun, 15 May 2016 00:15:12 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hongbo Zeng updated AIRFLOW-118:
--------------------------------
    Description: 
The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.

Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users need
to tune the numbers repeatedly, and do that again when the data size changes. This is not
scalable as the number of data sources grows. targetPartitionSize approach calculates the
number of segments automatically and is hassle free.


  was:
The definition of the two partition spec can be found (here)[http://druid.io/docs/latest/ingestion/batch-ingestion.html].

Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users need
to tune the numbers repeatedly, and do that again when the data size changes. This is not
scalable as the number of data sources grows. targetPartitionSize approach calculates the
number of segments automatically and is hassle free.



> use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-118
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-118
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Hongbo Zeng
>
> The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users
need to tune the numbers repeatedly, and do that again when the data size changes. This is
not scalable as the number of data sources grows. targetPartitionSize approach calculates
the number of segments automatically and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message