hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: Benchmarking problems
Date Wed, 28 Sep 2011 04:08:53 GMT
I turned it off because , it was trying to launch 2 copies of every
task and they are hogging my TTs.

I am just curious abt one thing .. Are the reducers in JOIN CPU
intensive or do they consume a lot of memory ?

>From my monitoring the TT during reduce phase ..its was pretty clear
that there was no swapping ...however I was not sure abt the CPU
thingy ...

Any one with same experience / workaround for this problem ??

On Tue, Sep 27, 2011 at 11:19 PM, Aggarwal, Vaibhav <vaggarw@amazon.com> wrote:
> You can choose to turn the speculative execution ON which might help you with few slow
progressing tasks.
> mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution
are the job conf options.
>
>
> -----Original Message-----
> From: bharath vissapragada [mailto:bharathvissapragada1990@gmail.com]
> Sent: Tuesday, September 27, 2011 1:22 AM
> To: hive-user@hadoop.apache.org
> Subject: Benchmarking problems
>
> Hey,
>
> I need some help regarding hive. I trying to benchmark Hive with TPCH SF 100 dataset.
For a simple SPJ query I ran (Select count(*) from supplier,customer where s_nationekey=c_nationkey)
,
>
> out of my 13 reduce tasks , 12 completed in less than 2 hrs and 1 ran for 6 hours. Following
are my cluster details :
>
> 10 Nodes (1 Master + 9 TTs+DNs) , 3.5GB ram per TT , 2 maps and 2 reducers max per TT,
600MB per task , 200MB io.sort.MB.
>
> I saw that no swapping occurred while running the reduce task .
> Following is the tail of the log on that machine ..where reduce ran for 6 hrs
>
> 2011-09-26 22:48:48,285 INFO
> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
> 47881000000 rows
> 2011-09-26 22:48:48,607 INFO ExecReducer: ExecReducer: processed
> 1280835 rows: used memory = 4840896
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 47881693522 rows
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 47881693522 rows
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 finished. closing...
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarded 0 rows
> 2011-09-26 22:48:48,608 WARN
> org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush at close: size
= 1
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 forwarding 1 rows
> 2011-09-26 22:48:48,608 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
> hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
> 2011-09-26 22:48:48,609 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
> FS hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/_tmp.000004_0
> 2011-09-26 22:48:48,609 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
> hdfs://master:54310/tmp/hive-hadoop/hive_2011-09-26_16-36-07_678_4030630084749797567/_tmp.-mr-10002/000004_0
> 2011-09-26 22:48:48,739 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished.
> closing...
> 2011-09-26 22:48:48,740 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
> 2011-09-26 22:48:48,847 INFO
> org.apache.hadoop.hive.ql.exec.GroupByOperator: 6 Close done
> 2011-09-26 22:48:48,847 INFO
> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
> 2011-09-26 22:48:48,847 INFO
> org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
> 2011-09-26 22:48:48,851 INFO org.apache.hadoop.mapred.TaskRunner:
> Task:attempt_201109261629_0001_r_000004_0 is done. And is in the process of commiting
> 2011-09-26 22:48:48,854 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201109261629_0001_r_000004_0'
done.
>
>
> One thing I noticed is that the stats of row forwarding are almost same across all the
tasks ..however this task ran for 6hrs where as all other just ran for 1,2 hrs ..
> Any help?
>
> Thanks
>
>
> --
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>



-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Mime
View raw message