hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: optimizer hints in Pig
Date Mon, 16 Nov 2009 23:15:08 GMT
In general I think optimizer hints fit well with Pig's approach to  
data processing, as expressed in our philosophic statement that Pigs  
are domestic animals (see http://hadoop.apache.org/pig/ 
philosophy.html ).

At least in the examples you give, I don't see 'with' as binding.  The  
user is giving Pig information; it can choose how to use it, or to not  
use it all.  I would like 'using' to continue to be binding as in that  
case the user is explicitly telling Pig to do something in a  
particular way.


On Nov 14, 2009, at 2:07 PM, Ashutosh Chauhan wrote:

> Hi All,
> We would like to know what Pig devs feel about optimizer hints.
> Traditionally, optimizer hints have been received with mixed reactions
> in RDBMS world.  Oracle provides lots of knobs[1][2] to turn and tune,
> while postgres[3][4] have tried to stay away from them. Mysql have few
> of them (e.g., straight_join). Surajit Chaudhary [5] (Microsoft) is
> making case in favor of them.
> More specifically, I am talking of hints like following
> a = filter 'mydata' by myudf ($1) with "selectivity 0.5";
> // This is letting user to tell Pig that  myudf filters out nearly
> half of tuples of 'mydata'.
> c = join a by $0, b by $0 with "selectivity a.$0 = b.$0, 0.1";
> // This is letting user to tell Pig that only 10% of keys in a will
> match with those in b.
> Exact syntax isn't important it could be adapted. But, question is
> does it seem to be  a useful enough idea to be added in Pig Latin.
> Pig's case is slightly different from other sql engines in that while
> other systems treats them as "hints" and thus are free to ignore them
> Pig treats hints as commands in a sense that it will fail even if it
> can figure out that hint will result in failure of query. Perhaps, Pig
> can interpret "using" as command and "with" as hint.
> Thoughts?
> Ashutosh
> [1] http://www.dba-oracle.com/art_otn_cbo_p7.htm
> [2] http://www.dba-oracle.com/oracle11g/oracle_11g_extended_optimizer_statistics.htm
> [3] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php
> [4] http://archives.postgresql.org/pgsql-hackers/2006-08/msg00506.php
> [5] portal.acm.org/ft_gateway.cfm?id=1559955&type=pdf

View raw message