hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@cloudera.com>
Subject Re: COUNT, AVG and nulls
Date Mon, 06 Jul 2009 17:58:31 GMT
+1 for standard semantics.

We need a COALESCE function to go along with this.


On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich <olgan@yahoo-inc.com> wrote:

> Hi,
> The current implementation of COUNT and AVG in Pig counts null values.
> This is inconsistent with SQL semantics and also with semantics of other
> aggregated functions such as SUM, MIN, and MAX. Originally we chose this
> implementation for performance reasons; however, we re-implemented both
> functions to support multi-step combiner and now the cost of checking
> for null for the case where combiner is invoked is trivial. (I ran some
> tests with COUNT and they showed no performance difference.) We will pay
> penalty for the non-combinable case including local mode but I think it
> is worth the price to have consistent semantics. Also as we are working
> on SQL support, having SQL compliant semantics becomes very desirable.
> Please, let us know if you have any concerns. I am planning to make the
> change later this week.
> Olga

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message