hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-13574) Standard Deviation in Hive 2 is still incorrect
Date Thu, 21 Apr 2016 09:37:25 GMT

     [ https://issues.apache.org/jira/browse/HIVE-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mich Talebzadeh updated HIVE-13574:
-----------------------------------
    Priority: Major  (was: Minor)

> Standard Deviation in Hive 2 is still incorrect
> -----------------------------------------------
>
>                 Key: HIVE-13574
>                 URL: https://issues.apache.org/jira/browse/HIVE-13574
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: All
>            Reporter: Mich Talebzadeh
>            Assignee: Mich Talebzadeh
>              Labels: patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I reported back in April 2015 that what Hive calls Standard Deviation Function  STDDEV
is a pointer to STDDEV_POP. This is incorrect and has not been rectified in Hive 2
> Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I did tests with
Spark 1.6 as well and Spark correctly points STTDEV to STDDEV_SAMP.
> The following query was used 
> SELECT
>         SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
AS MYSTDDEV,
>         STDDEV(amount_sold) AS STDDEV,
>         STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
>         STDDEV_POP(amount_sold) AS STDDEV_POP
> from    sales;
> The following is from running the above query on Hive where STDDEV -->  STDDEV_POP
which is incorrect
> +--------------------+---------------------+--------------------+---------------------+--+
> |      mystddev      |       stddev        |    stddev_samp     |     stddev_pop    
 |
> +--------------------+---------------------+--------------------+---------------------+--+
> | 260.7270919450411  | 260.72704617040444  | 260.7270722861465  | 260.72704617040444
 |
> +--------------------+---------------------+--------------------+---------------------+--+
> The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is correct
> +--------------------+---------------------+--------------------+---------------------+--+
> |      mystddev      |       stddev        |    stddev_samp     |     stddev_pop    
 |
> +--------------------+---------------------+--------------------+---------------------+--+
> | 260.7270919450411  | 260.7270722861637   | 260.7270722861637  | 260.72704617042166
 |
> +--------------------+---------------------+--------------------+---------------------+--+
> Hopefully The Hive one will be corrected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message