hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-165) var(col) built-in to go with avg(col) and count(col)
Date Sun, 14 Dec 2008 23:05:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jeff Hammerbacher updated HIVE-165:

    Component/s: Query Processor

Adding to "Query Processor" component.

> var(col) built-in to go with avg(col) and count(col)
> ----------------------------------------------------
>                 Key: HIVE-165
>                 URL: https://issues.apache.org/jira/browse/HIVE-165
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: David Phillips
>            Priority: Minor
> The last step in the unholy triumvirate of statistical built-ins is the variance. We
already have the n (count) and the mean (avg). I currently have a job or two that filters
all of the data into a single reducer which just computes mean/n/variance and writes it to
a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though,
as computing the variance myself is trivial.
> (Average, variance, and n can be co-computed in one pass, so if you're doing var() you
can basically have avg() and count() for free.)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message