hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Work started] (HIVE-13750) Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer when possible
Date Tue, 17 May 2016 10:23:13 GMT

     [ https://issues.apache.org/jira/browse/HIVE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on HIVE-13750 started by Jesus Camacho Rodriguez.
------------------------------------------------------
> Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer when possible
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13750
>                 URL: https://issues.apache.org/jira/browse/HIVE-13750
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>    Affects Versions: 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-13750.patch, HIVE-13750.patch
>
>
> Extend ReduceDedup to remove additional shuffle stage created by sorted dynamic partition
optimizer when possible, thus avoiding unnecessary work.
> By [~ashutoshc]:
> {quote}
> Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO) unconditionally
adds an extra shuffle stage. If sort columns of previous shuffle and partitioning columns
of table match, reduce sink deduplication optimizer removes extra shuffle stage, thus bringing
down overhead to zero. However, if they don’t match, we end up doing extra shuffle. This
can be improved since we can add table partition columns as a sort columns on earlier shuffle
and avoid this extra shuffle. This ensures that in cases query already has a shuffle stage,
we are not shuffling data again. 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message