hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayuran Yogarajah <mayuran.yogara...@casalemedia.com>
Subject Can remnant partitions cause Hive to slow down?
Date Mon, 02 May 2011 20:08:59 GMT
We've noticed that our Hive jobs appear to be getting slower and slower 
every day even though the data set isn't really growing by much.
Here are some run times taken from last month which shows the date and 
the duration of the job in minutes:

2010/12/31 -> 19.2166666666667
2011/01/31 -> 24.55
2011/02/28 -> 44.6166666666667
2011/03/31 -> 49.9833333333333
2011/04/30 -> 55.3833333333333

The only thing that stands out is that we're not deleting older 
partitions, so there are probably about two years worth of partitions in 
the system.
The jobs only use the partition for the current month, but I'm not sure 
if having the other partitions can somehow slow things down regardless of
them not being used.

Any advise and suggestions are welcome.

thanks,
M

Mime
View raw message