hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Fwd: ORDER BY clause in Hive
Date Tue, 31 Mar 2015 06:14:40 GMT
---------- Forwarded message ----------
From: Lefty Leverenz <leftyleverenz@gmail.com>
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive
To: Mich Talebzadeh <mich@peridale.co.uk>


Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>

That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you can set
the configuration parameter
<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
 *hive.groupby.orderby.position.**alias* to true, and then you can use
column positions in ORDER BY.

Here's the new documentation:

   - hive.groupby.orderby.position.alias
   <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
   - Order By
   <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>

   - Group By
   <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>


-- Lefty

On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <mich@peridale.co.uk>
wrote:

> Gents,
>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> Thanks
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] *On Behalf Of
> *Gopal Vijayaraghavan
> *Sent:* 30 March 2015 05:26
> *To:* user@hive.apache.org
> *Cc:* Lefty Leverenz
>
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> Hi Lefty,
>
>
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
>
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
>
>
> Cheers,
>
> Gopal
>
>
>
> *From: *Lefty Leverenz <leftyleverenz@gmail.com>
> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org>
> *Date: *Sunday, March 29, 2015 at 8:32 PM
> *To: *"user@hive.apache.org" <user@hive.apache.org>
> *Subject: *Re: ORDER BY clause in Hive
>
>
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki.  Thanks, Mich and Gopal!
>
>
> -- Lefty
>
>
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mich@peridale.co.uk>
> wrote:
>
> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , rs.Total_customer_amount
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
> FROM
>
> (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>         STDDEV(amount_sold) AS Standard_deviation
>
>         FROM sales
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
>           *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 581.0439095219863      |
>
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 613.7057080691426      |
>
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 623.1607772763742      |
>
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 576.9120977984521      |
>
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 590.4445500393804      |
>
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 624.337482834059       |
>
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 590.965120684093       |
>
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 600.7615005975689      |
>
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 622.221465710723       |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes back for this query with
>
>
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
>
> ----------- ---------------- --------------------- -------------
> ------------------
>
>        1743              238              94786.13     398.26105
> 582.26845
>
>         429              231              94819.41    410.473636
> 615.038404
>
>        2994              227              94862.61     417.89696
> 624.53793
>
>        6395              268              97010.48    361.979403
> 577.991448
>
>       12783              240              97573.55    406.556458
> 591.6785
>
>        4974              235              98006.16    417.047489
> 625.670115
>
>       42167              266              98585.96     370.62391
> 592.079099
>
>       10747              256              99578.09    388.976914
> 601.938312
>
>       11407              248             103412.66    416.986532
> 623.479751
>
>
>
> 9 rows selected.
>
>
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
>
>
> And hive returns for the same query
>
>
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | customer_id  | number_of_orders  | total_customer_amount  |
> average_order    | standard_deviation  |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | 42167.0      | 266               | 98585.96000000002      |
> 370.6239097744362   | 590.965120684093    |
>
> | 12783.0      | 240               | 97573.54999999996      |
> 406.5564583333332   | 590.4445500393804   |
>
> | 11407.0      | 248               | 103412.65999999995     |
> 416.9865322580643   | 622.221465710723    |
>
> | 10747.0      | 256               | 99578.08999999997      |
> 388.9769140624999   | 600.7615005975689   |
>
> | 6395.0       | 268               | 97010.47999999998      |
> 361.97940298507456  | 576.9120977984521   |
>
> | 4974.0       | 235               | 98006.16000000002      |
> 417.0474893617022   | 624.337482834059    |
>
> | 2994.0       | 227               | 94862.61000000006      |
> 417.89696035242315  | 623.1607772763742   |
>
> | 1743.0       | 238               | 94786.12999999993      |
> 398.2610504201678   | 581.0439095219863   |
>
> | 429.0        | 231               | 94819.41000000006      |
> 410.4736363636366   | 613.7057080691426   |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> 9 rows selected (215.774 seconds)
>
>
>
> But ordering in Hive does not seem to be correct! Please note ordering is
> on column three, *total_customer_amount*
>
>
>
> I also tried this in Sybase and got the same as Oracle.
>
>
>
> Adaptive Server cpu time: 100 ms.
>
> Customer_ID          Number_of_orders
> Total_customer_amount
> Average_order                                        Standard_deviation
>
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>
>                 1743              238
> 94786.13
> 398.2610504201680                  582.268450
>
>                   429              231
> 94819.41
>                      410.4736363636363                  615.038404
>
>                  2994              227
> 94862.61
> 417.8969603524229                  624.537930
>
>                  6395              268
> 97010.48
> 361.9794029850746                  577.991448
>
>                 12783              240
> 97573.55
> 406.5564583333333                  591.678500
>
>                  4974              235
> 98006.16
> 417.0474893617021                  625.670115
>
>                 42167              266
>                    98585.96
> 370.6239097744360                  592.079099
>
>                 10747              256
> 99578.09                                    388.9769140625000
>     601.938312
>
>                 11407              248
> 103412.66
> 416.9865322580645                  623.479751
>
>
>
> I tried Google search and seems to be different suggestions. May be I have
> to rewrite the code?
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
>
>

Mime
View raw message