hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] [Created] (HIVE-3276) optimize union sub-queries
Date Thu, 19 Jul 2012 04:44:34 GMT
Namit Jain created HIVE-3276:

             Summary: optimize union sub-queries
                 Key: HIVE-3276
             Project: Hive
          Issue Type: Bug
            Reporter: Namit Jain
            Assignee: Nadeem Moidu

It might be a good idea to optimize simple union queries containing map-reduce jobs in at
least one of the sub-qeuries.

For eg:

a query like:

insert overwrite table T1 partition P1
select * from 
    union all
) u;

today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
the final one for the union. 

It might be a good idea to optimize this. Instead of creating the union 
task, it might be simpler to create a move task (or something like a move
task), where the outputs of the two sub-queries will be moved to the final 
directory. This can easily extend to more than 2 sub-queries in the union.

This is only useful if there is a select * followed by filesink after the
union. This can be independently useful, and also be used to optimize the
skewed joins

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message