hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Phillips (JIRA)" <>
Subject [jira] Assigned: (HIVE-165) var(col) built-in to go with avg(col) and count(col)
Date Sat, 13 Dec 2008 01:02:44 GMT


David Phillips reassigned HIVE-165:

    Assignee: David Phillips

> var(col) built-in to go with avg(col) and count(col)
> ----------------------------------------------------
>                 Key: HIVE-165
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Wish
>            Reporter: Adam Kramer
>            Assignee: David Phillips
>            Priority: Minor
> The last step in the unholy triumvirate of statistical built-ins is the variance. We
already have the n (count) and the mean (avg). I currently have a job or two that filters
all of the data into a single reducer which just computes mean/n/variance and writes it to
a my guess is that this would be a pretty big speed increase. Not a huge deal though,
as computing the variance myself is trivial.
> (Average, variance, and n can be co-computed in one pass, so if you're doing var() you
can basically have avg() and count() for free.)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message