hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Optimizing ORC Sorting - Replace two level Partitions with one?
Date Sat, 10 Aug 2013 16:39:03 GMT
So there is one thing to be really carefully about bucketing. Say you
bucket a table into 10 buckets, select with where does not actually prune
the input buckets so many queries scan all the buckets.

On Sat, Aug 10, 2013 at 12:34 PM, Nitin Pawar <>wrote:

> will bucketing help? if you know finite # partiotions ?
> On Sat, Aug 10, 2013 at 9:26 PM, John Omernik <> wrote:
>> I have a table that currently uses RC files and has two levels of
>> partitions.  day and source.  The table is first partitioned by day, then
>> within each day there are 6-15 source partitions.  This makes for a lot of
>> crazy partitions and was wondering if there'd be a way to optimize this
>> with ORC files and some sorting.
>> Specifically, would there be a way in a new table to make source a field
>> (removing the partition)and somehow, as I am inserting into this new setup
>> sort by source in such a way that will help separate the files/indexes in a
>> way that gives me almost the same performance as ORC with the two level
>> partitions?  Just trying to optimize here and curious what people think.
>> John
> --
> Nitin Pawar

View raw message