hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <>
Subject How to run big queries in optimized way ?
Date Fri, 21 Sep 2012 02:30:14 GMT
We have datasets which are about 10-15 TB in size.

We want to run hive queries on top of this input data.

What are ways to reduce stress on our cluster for running many such big queries( include joins
too) in parallel ?
How to enable compression etc for intermediate hive output ?
How to make job cache does not go to high etc ?
In short , best practices for huge queries on hive ?

Any inputs are really appreciated !


Sent from my iPhone
View raw message