pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zs...@facebook.com>
Subject asking for comments on benchmark queries
Date Tue, 23 Jun 2009 05:36:50 GMT
Hi Pig team,

We'd like to get your feedback on a set of queries we implemented on Pig.

We've attached the hadoop configuration and pig queries in the email. We start the queries
by issuing "pig xxx.pig". The queries are from SIGMOD'2009 paper. More details are at https://issues.apache.org/jira/browse/HIVE-396
(Shall we open a JIRA on PIG for this?)

One improvement is that we are going to change hadoop to use LZO as intermediate compression
algorithm very soon. Previously we used gzip for all performance tests including hadoop, hive
and pig.

The reason that we specify the number of reducers in the query is to try to match the same
number of reducer as Hive automatically suggested. Please let us know what is the best way
to set the number of reducers in Pig.

Are there any other improvements we can make to the Pig query and the hadoop configuration?


View raw message