hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13750) Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer when possible
Date Thu, 12 May 2016 18:58:12 GMT
Jesus Camacho Rodriguez created HIVE-13750:
----------------------------------------------

             Summary: Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer
when possible
                 Key: HIVE-13750
                 URL: https://issues.apache.org/jira/browse/HIVE-13750
             Project: Hive
          Issue Type: Improvement
          Components: Physical Optimizer
    Affects Versions: 2.1.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


Extend ReduceDedup to remove additional shuffle stage created by sorted dynamic partition
optimizer when possible, thus avoiding unnecessary work.

By [~ashutoshc]:
{quote}
Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO) unconditionally adds
an extra shuffle stage. If sort columns of previous shuffle and partitioning columns of table
match, reduce sink deduplication optimizer removes extra shuffle stage, thus bringing down
overhead to zero. However, if they don’t match, we end up doing extra shuffle. This can
be improved since we can add table partition columns as a sort columns on earlier shuffle
and avoid this extra shuffle. This ensures that in cases query already has a shuffle stage,
we are not shuffling data again. 
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message