hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson" <eh...@microsoft.com>
Subject Re: Review Request 19216: Vectorized variance computation differs from row mode computation.
Date Sat, 15 Mar 2014 00:07:32 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19216/#review37296
-----------------------------------------------------------

Ship it!


Ship It!

- Eric Hanson


On March 14, 2014, 8:41 a.m., Jitendra Pandey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19216/
> -----------------------------------------------------------
> 
> (Updated March 14, 2014, 8:41 a.m.)
> 
> 
> Review request for hive, Eric Hanson and Remus Rusanu.
> 
> 
> Bugs: HIVE-6664
>     https://issues.apache.org/jira/browse/HIVE-6664
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Following query can show the difference:
> select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price),
stddev_samp(ss_sales_price) from store_sales.
> 
> The reason for the difference is that row mode converts the decimal value to double upfront
to calculate sum of values, when computing variance. But the vector mode performs local aggregate
sum as decimal and converts into double only at flush.
> 
> 
> Diffs
> -----
> 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFVarDecimal.txt c5af930 
>   ql/src/test/results/clientpositive/vector_decimal_aggregate.q.out 507f798 
> 
> Diff: https://reviews.apache.org/r/19216/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jitendra Pandey
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message