hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <>
Subject My query on Standard deviation function STDDEV() in Hive
Date Fri, 27 Mar 2015 21:49:01 GMT
Hi gurus,


STDDEV function can be used for both aggregates and analytics. Fortunately
in Hive it has been implemented which is great. I have a simple question on
this if I may


I would expect the in-built function STDDEV to run pretty efficiently in
most databases as they are system defined functions. At least that is my
understanding. I did a test on Hive with both STDDEV in-built function and
also the equivalent of it that I wrote myself


In this example the base table sales 918,843 rows imported from Oracle
sh.sales table. and I am trying to work STDDEV for as column called



        WHEN  COUNT (amount_sold) <= 1

           THEN  0



FROM sales;


I ran this first with straight forward calculation for standard deviation


1)     It returned 259.7804899502628               in 19.656 seconds


Then I ran it immediately after using STDDV function itself in hive


hive> SELECT STDDEV(amount_sold) from sales;


2)     It retuned 259.7803485874695                in 20.346 seconds


Now the value itself is pretty close but seems to be taking slightly more
time in 2)


When I did the same tests on base table in Oracle I got first for my



  2          WHEN  COUNT (amount_sold) <= 1

  3             THEN  0

  4           ELSE

  5           END

  6* FROM sh.sales


hddtester@MYDB.MICH.LOCAL> /






Elapsed: 00:00:00.23


It took 230 ms. Then I ran again using in built function in Oracle



hddtester@MYDB.MICH.LOCAL> SELECT STDDEV(amount_sold) FROM sh.sales;






Elapsed: 00:00:00.16


Right I would expect the in-built function to be more efficient and quicker.
In Oracle it took 160 ms compared to 230 ms from my own calculation.


My query is why in Hive it takes longer to do the same calculation with
in-built STDDEV function? Is this behaviour expected.







Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache


NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.


View raw message