hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aggarwal, Vaibhav" <vagg...@amazon.com>
Subject RE: Benchmarking problems
Date Tue, 27 Sep 2011 17:49:17 GMT
You can choose to turn the speculative execution ON which might help you with few slow progressing
tasks.
mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution are the
job conf options.


-----Original Message-----
From: bharath vissapragada [mailto:bharathvissapragada1990@gmail.com] 
Sent: Tuesday, September 27, 2011 1:22 AM
To: hive-user@hadoop.apache.org
Subject: Benchmarking problems

Hey,

I need some help regarding hive. I trying to benchmark Hive with TPCH SF 100 dataset. For
a simple SPJ query I ran (Select count(*) from supplier,customer where s_nationekey=c_nationkey)
,

out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran for 6 hours. Following
are my cluster details :

10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2 reducers max per TT, 600MB
per task , 200MB io.sort.MB.

I saw that no swapping occurred while running the reduce task .
Following is the tail of the log on that machine ..where reduce ran for 6 hrs

2011-09-26 22:48:48,285 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
47881000000 rows
2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
1280835 rows: used memory = 4840896
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522 rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522 rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
2011-09-26 22:48:48,608 WARN
org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush at close: size = 1
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
2011-09-26 22:48:48,608 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
2011-09-26 22:48:48,609 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
2011-09-26 22:48:48,739 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
closing...
2011-09-26 22:48:48,740 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
2011-09-26 22:48:48,847 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201109261629_0001_r_000004_0 is done. And is in the process of commiting
2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201109261629_0001_r_000004_0'
done.


One thing I noticed is that the stats of row forwarding are almost same across all the tasks
..however this task ran for 6hrs where as all other just ran for 1,2 hrs ..
Any help?

Thanks


--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Mime
View raw message