hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Updated] (HIVE-4926) Queries which specify clustered-by keys as constants will still scan all buckets
Date Wed, 24 Jul 2013 19:03:50 GMT


Gopal V updated HIVE-4926:

    Attachment: HIVE-4926-test.tgz

Simple self-contained test-case
> Queries which specify clustered-by keys as constants will still scan all buckets
> --------------------------------------------------------------------------------
>                 Key: HIVE-4926
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.12.0
>            Reporter: Gopal V
>         Attachments: HIVE-4926-test.tgz
> When tables are CLUSTERED BY (key) into multiple buckets, a query which specifies a key
in the query predicate will still scan all buckets in the directory.
> In the ideal scenario, only bucket needs to be inspected for a given key, particularly
if hive.enforce.bucketing is turned on.
> When a simple filter query like the following is run
> {code}
> select * from store_sales where ss_item_sk = 1;
> {code}
> The log files contain
> {code}
> Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000005_0
> Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000006_0
> Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000007_0
> Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000008_0
> Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000009_0
> {code}
> This is going through 32x the amount of data, compared to the right approach of scanning
only the partitions which match the predicate.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message