hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3430) group by followed by join with the same key should be optimized
Date Wed, 05 Sep 2012 15:15:07 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448814#comment-13448814
] 

Yin Huai commented on HIVE-3430:
--------------------------------

Yes, YSmart (https://issues.apache.org/jira/browse/HIVE-2206) can optimize this pattern. 

For the query shown below, two jobs will be generated. The first one takes care the join operation
on "key", and the second one takes care group by and join operations on "value". 
{code:SQL}
select * from
(
  select c.value, count(1) as cnt from
  (
    select b.key, b.value from
    (
      select key, length(value) from T1 where ds = '1'
    ) a
    join
    T2 b on b.ds = '1' and a.key = b.key
  ) c
  group by c.value
) d
join
(
  select value, count(1) as cnt from T2 c where c.ds = '1' group by value
) e
on d.value = e.value;
{code}
                
> group by followed by join with the same key should be optimized
> ---------------------------------------------------------------
>
>                 Key: HIVE-3430
>                 URL: https://issues.apache.org/jira/browse/HIVE-3430
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message