hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <>
Subject Running TPCH workload on Hive
Date Fri, 14 Sep 2012 10:18:22 GMT
Hi folks,

Iam trying to run TPC-H workload on Hive (Hive-600). However Iam facing
problems with configuration. The queries are taking insanely long time.

I ran Q21 on a TPCH workload of SF 100 (same dataset on which  the
experiments in that doc were run) on a cluster of 8 datanodes+TT and 1 NN.
My datanode config is as follows

2 dual core CPU (total 4 threads in parallel)
3.8GB main memory per node

configured 4 Maps and 4 reducers per node . I've set hive-reducers max to
32 (total reduce slots in hadoop cluster) instead of letting hive decide it.

My Q21 has been running for 12 hrs for now compared to 2500 seconds that
was mentioned in the results . I wonder what is so terribly wrong with my
config. Some of my reducers take insanely long time (6hrs sometime) and
others take 2hrs (even this is more compared to the overall run time of
2500secs of same query as in the results).

Can someone help me with this? Is the data partitioned or something (in the
Bharath .V

View raw message