hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiacai Liu <jiacai2...@gmail.com>
Subject Re: Which one should i use for benchmark tasks in hive & hadoop
Date Sun, 06 Mar 2016 12:08:29 GMT
I have answered this question at stackoverflow.☺

On Sun, Mar 6, 2016 at 1:47 PM, dhruv kapatel <kapateldhruv@gmail.com>
wrote:

>
>
> Hi
>
> I am comparing performance of pig and hive for weblog data.
> I was reading this pig and hive benchmarks. In which one statement written
> on page 10 that "The CPU time
> required by a job running on 10 node cluster will (more or less) be the
> same
> than the time required to run the same job on a 1000 node cluster. However
> the real time it takes the job to complete on the 1000 node cluster will be
> 100 times less than if it were to run on a 10 node cluster."
>
> How it will take same cpu time on clusters having different capacity?
>
> In this benchmark they have considered both real and cumulative cpu time.
> As real time affected by other processes also which time shouls i consider
> for actual performance measure of pig and hive?
>
> See question below for more details.
>
>
> http://stackoverflow.com/questions/35500987/which-one-should-i-use-for-benchmark-tasks-in-hadoop-usersys-time-or-total-cpu
>
>
> http://www.ibm.com/developerworks/library/ba-pigvhive/pighivebenchmarking.pdf
> .
>
> --
>
>
> *With Regards:Kapatel Dhruv v*
>
>
>
>
>
>
>
>

Mime
View raw message