pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zs...@facebook.com>
Subject A simple performance benchmark for Hadoop, Hive and Pig
Date Fri, 19 Jun 2009 04:29:35 GMT
Hi all,

Yuntao Jia, our intern this summer, did a simple performance benchmark for Hadoop, Hive and
Pig based on the queries in the SIGMOD 2009 paper: A Comparison of Approaches to Large-Scale
Data Analysis

The report and the performance test kit are both attached here:
http://issues.apache.org/jira/browse/HIVE-396


We tried our best to get good performance out of Hive and Pig, and we keep the hadoop program
as close as it is from the SIGMOD paper.  We welcome all suggestions on how we can improve
the performance more by both changing the configuration or improving the code.


While we tried our best to be fair, system settings and environments do affect the result
a lot.  So we encourage everybody to try out the performance test kit on their own cluster,
and we will appreciate if everybody can share their results.


Here is the summary.  The details are in the report hive_benchmark_2009-06-18.pdf from the
link above.

Query: GREP SELECT
Hadoop: 136.1s
Hive:   125.4s
Pig:    247.8s

Query: RANKINGS SELECT
Hadoop: 26.1s
Hive:   31.0s
Pig:    38.4s

Query: USERVISITS AGGREGATION
Hadoop: 533.8s
Hive:   768.8s
Pig:    855.4s

Query: RANKINGS USERVISITS JOIN
Hadoop: 470.0s
Hive:   471.3s
Pig:    763.9s

Please take a look at hive_benchmark_2009-06-18.pdf from the link above for details. Let's
keep discussions on http://issues.apache.org/jira/browse/HIVE-396 so it's easier to keep track.


Zheng


Mime
View raw message