hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Mattas <anth...@mattas.net>
Subject Benchmarking Hive Changes
Date Wed, 05 Mar 2014 02:31:42 GMT
I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 using the HDP Sandbox.


I took one of their example queries and executed it with the tables stored as TEXTFILE, RCFILE,
and ORC. I also tried enabling enabling vectorized execution, and predicate pushdown.

SELECT s07.description, s07.salary, s08.salary,
  s08.salary - s07.salary
FROM
  sample_07 s07 JOIN sample_08 s08
ON ( s07.code = s08.code)
WHERE
 s07.salary < s08.salary
SORT BY s08.salary-s07.salary DESC

Ultimately there was not much different performance in any of the executions, can someone
clarify for me if I need an actual full cluster to see performance improvements, or if I’m
missing something else. I thought at minimum I would have seen an improvement moving to ORC
from TEXTFILE.
Mime
View raw message