hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Thusoo <athu...@facebook.com>
Subject Re: A simple performance benchmark for Hadoop, Hive and Pig
Date Fri, 19 Jun 2009 18:06:19 GMT
This numbers are definitely preliminary and the reason that we send them out was to involve
the community from the get go and have them critique this work. The mistake though was sending
this out on the users list as opposed to the dev lists. 
Regarding better than map/reduce I think that the number is better than thae particular way
the query was implemented in the SIGMOD paper. It is more of a reflection of the implementation
there as opposed to map/reduce in general. 

In keeping with Owen's comments we should move this discussion to the dev lists, users is
not an appropriate forum for it. 


----- Original Message -----
From: Owen O'Malley <owen.omalley@gmail.com>
To: core-user@hadoop.apache.org <core-user@hadoop.apache.org>; pig-user@hadoop.apache.org
<pig-user@hadoop.apache.org>; hive-user@hadoop.apache.org <hive-user@hadoop.apache.org>
Sent: Fri Jun 19 10:03:06 2009
Subject: Re: A simple performance benchmark for Hadoop, Hive and Pig

On Thu, Jun 18, 2009 at 9:29 PM, Zheng Shao <zshao@facebook.com> wrote:

> Yuntao Jia, our intern this summer, did a simple performance benchmark for
> Hadoop, Hive and Pig based on the queries in the SIGMOD 2009 paper: A
> Comparison of Approaches to Large-Scale Data Analysis

It should be noted that no one on the Pig team was involved in setting up
the benchmarks and the queries don't follow the Pig cookbook suggestions for
writing efficient queries, so these results should be considered *extremely*
preliminary. Furthermore, I can't see any way that Hive should be able to
beat raw map/reduce, since Hive uses map/reduce to run the job.

In the future, it would be better to involve the respective communities
(mapreduce-dev and pig-dev) far before pushing benchmark results out to the
user lists. The Hadoop project, which includes all three subprojects, needs
to be a cooperative community that is trying to build the best software we
can. Getting benchmark numbers is good, but are better done in a
collaborative manner.

-- Owen
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message