pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Sometimes-Algebraic functions
Date Mon, 29 Aug 2011 22:46:31 GMT
Had a fun discovery today: a user tried to do the following (perfectly
reasonable) thing:

bag_sizes = foreach grouped_data genrate
  group, SIZE(grouped_data);

.. and it was excessively slow, because SIZE is not algebraic.

I had him switch to COUNT_STAR, and that worked, of course.

But he reasonably pointed out that SIZE should work, too. It's documented to
work on bags, after all.

I tried switching SIZE's argToFunc mapping to return
COUNT_STAR.class.getName(), but discovered much to my dismay that algebraic
optimization still does not get invoked.

Presumably because we check if SIZE is instanceof Algebraic, instead of
checking the class that's returned by argToFuncMapping.

Is that about right? Do you guys agree that's a bug?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message