hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kramer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-362) avg() ignores null values; consider variant that doesn't
Date Thu, 09 Apr 2009 17:58:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697577#action_12697577

Adam Kramer commented on HIVE-362:

This is not a problem anymore.

Just to be a bit opinionated for a few moments, though, I do believe the standards to be wrong
on this issue; NULL values are an excellent way to force scientists to really think about
the query they're running, and implicitly removing them will generally lead to harder-to-debug
errors and more wasted time than having to call the "remove nulls" version, call it avg_rn,

> avg() ignores null values; consider variant that doesn't
> --------------------------------------------------------
>                 Key: HIVE-362
>                 URL: https://issues.apache.org/jira/browse/HIVE-362
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Adam Kramer
> Some of the current aggregates (sum, avg) have a fairly standard behavior: If any item
in the list is NULL, the sum, average, etc., cannot be computed. And so, NULL is returned.
> 1) If this is the case, the query should return much faster--see a null, return NULL,
> 2) It would be nice to have versions or ways to use these functions with NULL data--specifically,
to treat the NULL as zero or to ignore the NULL and return the results for non-NULL data.
> This also would apply to the variance functions referenced in https://issues.apache.org/jira/browse/HIVE-165

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message