hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Optimizing ORC Sorting - Replace two level Partitions with one?
Date Sat, 10 Aug 2013 15:56:16 GMT
I have a table that currently uses RC files and has two levels of
partitions.  day and source.  The table is first partitioned by day, then
within each day there are 6-15 source partitions.  This makes for a lot of
crazy partitions and was wondering if there'd be a way to optimize this
with ORC files and some sorting.

Specifically, would there be a way in a new table to make source a field
(removing the partition)and somehow, as I am inserting into this new setup
sort by source in such a way that will help separate the files/indexes in a
way that gives me almost the same performance as ORC with the two level
partitions?  Just trying to optimize here and curious what people think.

John

Mime
View raw message