hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
Date Mon, 17 Mar 2014 21:24:46 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938407#comment-13938407
] 

Jitendra Nath Pandey commented on HIVE-6664:
--------------------------------------------

I have committed this to trunk.

[~rhbutani] This bug affects hive-0.13 and causes different results than row-mode execution.
This should be fixed in branch-0.13 as well.


> Vectorized variance computation differs from row mode computation.
> ------------------------------------------------------------------
>
>                 Key: HIVE-6664
>                 URL: https://issues.apache.org/jira/browse/HIVE-6664
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-6664.1.patch, HIVE-6664.1.patch, HIVE-6664.1.patch
>
>
> Following query can show the difference:
> select  var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price),
stddev_samp(ss_sales_price) from store_sales.
> The reason for the difference is that row mode converts the decimal value to double upfront
to calculate sum of values, when computing variance. But the vector mode performs local aggregate
sum as decimal and converts into double only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message