hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Thusoo <athu...@facebook.com>
Subject RE: Dealing with large number of partitions
Date Fri, 11 Jun 2010 23:09:49 GMT
+1 to that. That should help provided you are running hadoop 0.20 ..

Ashish

________________________________
From: wd [mailto:wd@wdicc.com]
Sent: Thursday, June 10, 2010 11:36 PM
To: hive-user@hadoop.apache.org
Subject: Re: Dealing with large number of partitions


Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; before you
query, this may be help.


2010/6/11 Sammy Yu <syu@brightedge.com<mailto:syu@brightedge.com>>
Hi,
   I am having an issue with a large number of 4000 partitions (each being very small <10k
files).  Any queries that I do which involve these partitions take an extremely long time
to complete (10+ hours), I was wondering if there was any easy way in hive without having
to merge the files improve it's performance.  I can see the map reduce jobs are taking a long
time due to the fact that there are so many separated raw data files that need to be read.
 I saw that HIVE-1332 dealt with using HAR files for partitioning.  Could this perhaps help
performance rather than hurt it, given that the queries will be using all the partitions in
the har file?

Thanks,
Sammy







Mime
View raw message