hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sammy Yu <>
Subject Dealing with large number of partitions
Date Fri, 11 Jun 2010 06:25:27 GMT
   I am having an issue with a large number of 4000 partitions (each being
very small <10k files).  Any queries that I do which involve these
partitions take an extremely long time to complete (10+ hours), I was
wondering if there was any easy way in hive without having to merge the
files improve it's performance.  I can see the map reduce jobs are taking a
long time due to the fact that there are so many separated raw data files
that need to be read.  I saw that HIVE-1332 dealt with using HAR files for
partitioning.  Could this perhaps help performance rather than hurt it,
given that the queries will be using all the partitions in the har file?


View raw message