hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emil Ibrishimov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-607) Create statistical UDFs.
Date Tue, 28 Jul 2009 02:02:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735890#action_12735890
] 

Emil Ibrishimov commented on HIVE-607:
--------------------------------------

Hey Scott. The formula you are using has precision problems when the variance is very small
relatively to the sum of squares (devavg and avg*avg can get really big while at the same
time the variance can still be really small and this way a lot of information can be lost
- sometimes the result can be even negative).
I am using a modification of this formula: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
which fixes this problem.
I will attach a patch tomorrow when I'm done testing it.

> Create statistical UDFs.
> ------------------------
>
>                 Key: HIVE-607
>                 URL: https://issues.apache.org/jira/browse/HIVE-607
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: S. Alex Smith
>            Assignee: Emil Ibrishimov
>            Priority: Minor
>         Attachments: UDAFStddev.java
>
>
> Create UDFs replicating:
> STD() 	Return the population standard deviation
> STDDEV_POP()(v5.0.3) 	Return the population standard deviation
> STDDEV_SAMP()(v5.0.3) 	Return the sample standard deviation
> STDDEV() 	Return the population standard deviation
> SUM() 	Return the sum
> VAR_POP()(v5.0.3) 	Return the population standard variance
> VAR_SAMP()(v5.0.3) 	Return the sample variance
> VARIANCE()(v4.1) 	Return the population standard variance
> as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message