hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Standard Deviation in Hive 2 is still incorrect
Date Wed, 20 Apr 2016 05:54:22 GMT
Will do thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 April 2016 at 23:33, Alan Gates <alanfgates@gmail.com> wrote:

> Have you filed a JIRA ticket for this?  If not, please do so we can track
> it and fix it.  Patches are welcomed as well. :)
>
> Alan.
>
> > On Apr 4, 2016, at 15:27, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
> >
> >
> > Hi,
> >
> > I reported back in April 2015 that what Hive calls Standard Deviation
> Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
> been rectified in Hive 2
> >
> > Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also
> I did tests with Spark 1.6 as well and Spark correctly points STTDEV to
> STDDEV_SAMP.
> >
> > The following query was used
> >
> > SELECT
> >
>  SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
> AS MYSTDDEV,
> >         STDDEV(amount_sold) AS STDDEV,
> >         STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
> >         STDDEV_POP(amount_sold) AS STDDEV_POP
> > from    sales;
> >
> > The following is from running the above query on Hive where STDDEV -->
> STDDEV_POP which is incorrect
> >
> >
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> > |      mystddev      |       stddev        |    stddev_samp     |
>  stddev_pop      |
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> > | 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
> 260.72704617040444  |
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> >
> > The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
> correct
> >
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> > |      mystddev      |       stddev        |    stddev_samp     |
>  stddev_pop      |
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> > | 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
> 260.72704617042166  |
> >
> +--------------------+---------------------+--------------------+---------------------+--+
> >
> > Hopefully The Hive one will be corrected.
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
>
>

Mime
View raw message