hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan DolinĂ¡r <dolik....@gmail.com>
Subject Re: Multi-group-by select always scans entire table
Date Tue, 29 May 2012 05:57:25 GMT
On Fri, May 25, 2012 at 12:03 PM, Jan DolinĂ¡r <dolik.rce@gmail.com> wrote:
>
> -- see what happens when you try to perform multi-group-by query on one of
> the partitions
> EXPLAIN EXTENDED
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
>     SELECT exp_col1
>     WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
>     SELECT exp_col1
>     WHERE (part_col=2);
> -- result: it wants to scan all partitions :-(
>

Since nobody else did, let me answer myself... In the end I found out that
the correct partition pruning can be achieved using subquery. Continuing
the example from my last post, the query would be:

FROM (
    SELECT * FROM partition_test
    LATERAL VIEW explode(col1) tmp AS exp_col1
    WHERE part_col=2
) t
INSERT OVERWRITE DIRECTORY '/test/1'
    SELECT exp_col1
INSERT OVERWRITE DIRECTORY '/test/2'
    SELECT exp_col1;

I still think the pruning should work correctly no matter how the query is
written, but for now I'm happy with this solution.

J. Dolinar

Mime
View raw message