hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: Benchmarking Hive Changes
Date Wed, 05 Mar 2014 13:47:39 GMT
Are you doing on standalone one box? How large are your test files and how long of the jobs
of each type took?

> From: anthony@mattas.net
> Subject: Benchmarking Hive Changes
> Date: Tue, 4 Mar 2014 21:31:42 -0500
> To: user@hadoop.apache.org
> I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 using the
HDP Sandbox. 
> I took one of their example queries and executed it with the tables stored as TEXTFILE,
RCFILE, and ORC. I also tried enabling enabling vectorized execution, and predicate pushdown.
> SELECT s07.description, s07.salary, s08.salary,
>   s08.salary - s07.salary
>   sample_07 s07 JOIN sample_08 s08
> ON ( s07.code = s08.code)
>  s07.salary < s08.salary
> SORT BY s08.salary-s07.salary DESC
> Ultimately there was not much different performance in any of the executions, can someone
clarify for me if I need an actual full cluster to see performance improvements, or if I’m
missing something else. I thought at minimum I would have seen an improvement moving to ORC
View raw message