hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject RE: ORDER BY clause in Hive
Date Mon, 30 Mar 2015 08:46:17 GMT
Gents,

 

Hive as I see it does not support ORDER BY Column position. It only supports
ORDER BY Column name.

 

Thanks

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 30 March 2015 05:26
To: user@hive.apache.org
Cc: Lefty Leverenz
Subject: Re: ORDER BY clause in Hive

 

Hi Lefty,

 

Couldn't find the documentation for what
hive.groupby.orderby.position.alias=true does.

 

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

 

Cheers,

Gopal

 

From: Lefty Leverenz <leftyleverenz@gmail.com>
Reply-To: "user@hive.apache.org" <user@hive.apache.org>
Date: Sunday, March 29, 2015 at 8:32 PM
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: Re: ORDER BY clause in Hive

 

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!




-- Lefty

 

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mich@peridale.co.uk>
wrote:

Hi Lefty, Gopal,

 

It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.

 

So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK

 

SELECT

          rs.Customer_ID

        , rs.Number_of_orders

        , rs.Total_customer_amount

        , rs.Average_order

        , rs.Standard_deviation

FROM

(

        SELECT cust_id AS Customer_ID,

        COUNT(amount_sold) AS Number_of_orders,

        SUM(amount_sold) AS Total_customer_amount,

        AVG(amount_sold) AS Average_order,

        STDDEV(amount_sold) AS Standard_deviation

        FROM sales

        GROUP BY cust_id

        HAVING SUM(amount_sold) > 94000

        AND AVG(amount_sold) < STDDEV(amount_sold)

) rs

ORDER BY

          rs.Total_customer_amount

;

 

 

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
rs.average_order   | rs.standard_deviation  |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| 1743.0          | 238                  | 94786.12999999993         |
398.2610504201678   | 581.0439095219863      |

| 429.0           | 231                  | 94819.41000000006         |
410.4736363636366   | 613.7057080691426      |

| 2994.0          | 227                  | 94862.61000000006         |
417.89696035242315  | 623.1607772763742      |

| 6395.0          | 268                  | 97010.47999999998         |
361.97940298507456  | 576.9120977984521      |

| 12783.0         | 240                  | 97573.54999999996         |
406.5564583333332   | 590.4445500393804      |

| 4974.0          | 235                  | 98006.16000000002         |
417.0474893617022   | 624.337482834059       |

| 42167.0         | 266                  | 98585.96000000002         |
370.6239097744362   | 590.965120684093       |

| 10747.0         | 256                  | 99578.08999999997         |
388.9769140624999   | 600.7615005975689      |

| 11407.0         | 248                  | 103412.65999999995        |
416.9865322580643   | 622.221465710723       |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

9 rows selected (209.699 seconds)

 

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive

 

Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

----------- ---------------- --------------------- -------------
------------------

       1743              238              94786.13     398.26105
582.26845

        429              231              94819.41    410.473636
615.038404

       2994              227              94862.61     417.89696
624.53793

       6395              268              97010.48    361.979403
577.991448

      12783              240              97573.55    406.556458
591.6785

       4974              235              98006.16    417.047489
625.670115

      42167              266              98585.96     370.62391
592.079099

      10747              256              99578.09    388.976914
601.938312

      11407              248             103412.66    416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive returns for the same query

 

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| customer_id  | number_of_orders  | total_customer_amount  |
average_order    | standard_deviation  |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| 42167.0      | 266               | 98585.96000000002      |
370.6239097744362   | 590.965120684093    |

| 12783.0      | 240               | 97573.54999999996      |
406.5564583333332   | 590.4445500393804   |

| 11407.0      | 248               | 103412.65999999995     |
416.9865322580643   | 622.221465710723    |

| 10747.0      | 256               | 99578.08999999997      |
388.9769140624999   | 600.7615005975689   |

| 6395.0       | 268               | 97010.47999999998      |
361.97940298507456  | 576.9120977984521   |

| 4974.0       | 235               | 98006.16000000002      |
417.0474893617022   | 624.337482834059    |

| 2994.0       | 227               | 94862.61000000006      |
417.89696035242315  | 623.1607772763742   |

| 1743.0       | 238               | 94786.12999999993      |
398.2610504201678   | 581.0439095219863   |

| 429.0        | 231               | 94819.41000000006      |
410.4736363636366   | 613.7057080691426   |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

9 rows selected (215.774 seconds)

 

But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount

 

I also tried this in Sybase and got the same as Oracle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID          Number_of_orders Total_customer_amount
Average_order                                        Standard_deviation

-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------

                1743              238
94786.13                                    398.2610504201680
582.268450

                  429              231
94819.41                                    410.4736363636363
615.038404

                 2994              227
94862.61                                    417.8969603524229
624.537930

                 6395              268
97010.48                                    361.9794029850746
577.991448

                12783              240
97573.55                                    406.5564583333333
591.678500

                 4974              235
98006.16                                    417.0474893617021
625.670115

                42167              266
98585.96                                    370.6239097744360
592.079099

                10747              256
99578.09                                    388.9769140625000
601.938312

                11407              248
103412.66                                    416.9865322580645
623.479751

 

I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

 


Mime
View raw message