hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <ashutosh.chau...@gmail.com>
Subject optimizer hints in Pig
Date Sat, 14 Nov 2009 22:07:51 GMT
Hi All,

We would like to know what Pig devs feel about optimizer hints.
Traditionally, optimizer hints have been received with mixed reactions
in RDBMS world.  Oracle provides lots of knobs[1][2] to turn and tune,
while postgres[3][4] have tried to stay away from them. Mysql have few
of them (e.g., straight_join). Surajit Chaudhary [5] (Microsoft) is
making case in favor of them.
More specifically, I am talking of hints like following

a = filter 'mydata' by myudf ($1) with "selectivity 0.5";
// This is letting user to tell Pig that  myudf filters out nearly
half of tuples of 'mydata'.

c = join a by $0, b by $0 with "selectivity a.$0 = b.$0, 0.1";
// This is letting user to tell Pig that only 10% of keys in a will
match with those in b.

Exact syntax isn't important it could be adapted. But, question is
does it seem to be  a useful enough idea to be added in Pig Latin.
Pig's case is slightly different from other sql engines in that while
other systems treats them as "hints" and thus are free to ignore them
Pig treats hints as commands in a sense that it will fail even if it
can figure out that hint will result in failure of query. Perhaps, Pig
can interpret "using" as command and "with" as hint.



[1] http://www.dba-oracle.com/art_otn_cbo_p7.htm
[2] http://www.dba-oracle.com/oracle11g/oracle_11g_extended_optimizer_statistics.htm
[3] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php
[4] http://archives.postgresql.org/pgsql-hackers/2006-08/msg00506.php
[5] portal.acm.org/ft_gateway.cfm?id=1559955&type=pdf

View raw message