hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dhruv kapatel <kapateldh...@gmail.com>
Subject Fwd: Which one should i use for benchmark tasks in hive & hadoop
Date Sun, 06 Mar 2016 05:47:37 GMT
Hi

I am comparing performance of pig and hive for weblog data.
I was reading this pig and hive benchmarks. In which one statement written
on page 10 that "The CPU time
required by a job running on 10 node cluster will (more or less) be the same
than the time required to run the same job on a 1000 node cluster. However
the real time it takes the job to complete on the 1000 node cluster will be
100 times less than if it were to run on a 10 node cluster."

How it will take same cpu time on clusters having different capacity?

In this benchmark they have considered both real and cumulative cpu time.
As real time affected by other processes also which time shouls i consider
for actual performance measure of pig and hive?

See question below for more details.

http://stackoverflow.com/questions/35500987/which-one-should-i-use-for-benchmark-tasks-in-hadoop-usersys-time-or-total-cpu

http://www.ibm.com/developerworks/library/ba-pigvhive/pighivebenchmarking.pdf
.

-- 


*With Regards:Kapatel Dhruv v*

Mime
View raw message