Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C13F017269 for ; Mon, 30 Mar 2015 04:26:04 +0000 (UTC) Received: (qmail 24867 invoked by uid 500); 30 Mar 2015 04:26:02 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 24799 invoked by uid 500); 30 Mar 2015 04:26:02 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 24789 invoked by uid 99); 30 Mar 2015 04:26:02 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 04:26:02 +0000 Received: from [192.168.1.108] (c-67-180-199-97.hsd1.ca.comcast.net [67.180.199.97]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 284481A012D; Mon, 30 Mar 2015 04:25:56 +0000 (UTC) User-Agent: Microsoft-MacOutlook/14.4.8.150116 Date: Sun, 29 Mar 2015 21:25:53 -0700 Subject: Re: ORDER BY clause in Hive From: Gopal Vijayaraghavan Sender: Gopal Vijayaraghavan To: "user@hive.apache.org" CC: Lefty Leverenz Message-ID: Thread-Topic: ORDER BY clause in Hive References: <05d701d069b4$c71f6e60$555e4b20$@co.uk> <05f401d06a02$35848320$a08d8960$@co.uk> In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3510509157_4492816" > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3510509157_4492816 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Hi Lefty, Couldn=B9t find the documentation for what hive.groupby.orderby.position.alias=3Dtrue does. I suspect that might be what Mich was looking for (though I tend to write the column names explicitly). Cheers, Gopal From: Lefty Leverenz Reply-To: "user@hive.apache.org" Date: Sunday, March 29, 2015 at 8:32 PM To: "user@hive.apache.org" Subject: Re: ORDER BY clause in Hive I added information about this in the Order By section of the wiki. Thanks, Mich and Gopal! -- Lefty On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh wrote: > Hi Lefty, Gopal, > =20 > It appears that ORDER BY 3 is not interpreted as ORDERR BY =B3the result se= t > column three=B2 which standard SQL as evident from Oracle and Sybase does. > =20 > So I made it an ORDER BY from the result set EXPLICITELY as shown below a= nd it > worked OK > =20 > SELECT > rs.Customer_ID > , rs.Number_of_orders > , rs.Total_customer_amount > , rs.Average_order > , rs.Standard_deviation > FROM > ( > SELECT cust_id AS Customer_ID, > COUNT(amount_sold) AS Number_of_orders, > SUM(amount_sold) AS Total_customer_amount, > AVG(amount_sold) AS Average_order, > STDDEV(amount_sold) AS Standard_deviation > FROM sales > GROUP BY cust_id > HAVING SUM(amount_sold) > 94000 > AND AVG(amount_sold) < STDDEV(amount_sold) > ) rs > ORDER BY > rs.Total_customer_amount > ; > =20 > =20 > +-----------------+----------------------+---------------------------+---= ----- > -------------+------------------------+--+ > | rs.customer_id | rs.number_of_orders | rs.total_customer_amount | > rs.average_order | rs.standard_deviation | > +-----------------+----------------------+---------------------------+---= ----- > -------------+------------------------+--+ > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > +-----------------+----------------------+---------------------------+---= ----- > -------------+------------------------+--+ > 9 rows selected (209.699 seconds) > =20 > =20 > Regards, > =20 >=20 > Mich Talebzadeh > =20 > http://talebzadehmich.wordpress.com > =20 > Publications due shortly: > Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache > =20 > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this mes= sage > shall not be understood as given or endorsed by Peridale Ltd, its subsidi= aries > or their employees, unless expressly so stated. It is the responsibility = of > the recipient to ensure that this email is virus free, therefore neither > Peridale Ltd, its subsidiaries nor their employees accept any responsibil= ity. > =20 >=20 > From: Mich Talebzadeh [mailto:mich@peridale.co.uk] > Sent: 29 March 2015 00:11 > To: user@hive.apache.org > Subject: ORDER BY clause in Hive >=20 > =20 > Hi, > =20 > Can someone point me to doc or otherwise to see if ORDER BY clause in Hiv= e is > working OK > =20 > I have a simple aggregate query as follows: > =20 > SELECT cust_id AS Customer_ID, > COUNT(amount_sold) AS Number_of_orders, > SUM(amount_sold) AS Total_customer_amount, > AVG(amount_sold) AS Average_order, > STDDEV(amount_sold) AS Standard_deviation > FROM sales > GROUP BY cust_id > HAVING SUM(amount_sold) > 94000 > AND AVG(amount_sold) < STDDEV(amount_sold) > ORDER BY 3 ; > =20 > The original table and data are from Oracle sh.sales table > =20 > Oracle comes back for this query with > =20 > CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER > STANDARD_DEVIATION > ----------- ---------------- --------------------- ------------- > ------------------ > 1743 238 94786.13 398.26105 > 582.26845 > 429 231 94819.41 410.473636 > 615.038404 > 2994 227 94862.61 417.89696 > 624.53793 > 6395 268 97010.48 361.979403 > 577.991448 > 12783 240 97573.55 406.556458 > 591.6785 > 4974 235 98006.16 417.047489 > 625.670115 > 42167 266 98585.96 370.62391 > 592.079099 > 10747 256 99578.09 388.976914 > 601.938312 > 11407 248 103412.66 416.986532 > 623.479751 > =20 > 9 rows selected. > =20 > Ordered by TOTAL_CUSTOMER_AMOUNT > =20 > And hive returns for the same query > =20 > +--------------+-------------------+------------------------+------------= ----- > ----+---------------------+--+ > | customer_id | number_of_orders | total_customer_amount | average_= order > | standard_deviation | > +--------------+-------------------+------------------------+------------= ----- > ----+---------------------+--+ > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > +--------------+-------------------+------------------------+------------= ----- > ----+---------------------+--+ > 9 rows selected (215.774 seconds) > =20 > But ordering in Hive does not seem to be correct! Please note ordering is= on > column three, total_customer_amount > =20 > I also tried this in Sybase and got the same as Oracle. > =20 > Adaptive Server cpu time: 100 ms. > Customer_ID Number_of_orders Total_customer_amount > Average_order Standard_deviation > -------------------- ---------------- > ----------------------------------------- > ---------------------------------------------------- > --------------------------- > 1743 238 > 94786.13 398.2610504201680 > 582.268450 > 429 231 > 94819.41 410.4736363636363 > 615.038404 > 2994 227 > 94862.61 417.8969603524229 > 624.537930 > 6395 268 > 97010.48 361.9794029850746 > 577.991448 > 12783 240 > 97573.55 406.5564583333333 > 591.678500 > 4974 235 > 98006.16 417.0474893617021 > 625.670115 > 42167 266 > 98585.96 370.6239097744360 > 592.079099 > 10747 256 > 99578.09 388.9769140625000 > 601.938312 > 11407 248 > 103412.66 416.9865322580645 > 623.479751 > =20 > I tried Google search and seems to be different suggestions. May be I hav= e to > rewrite the code? > =20 > Thanks > =20 > Mich Talebzadeh > =20 > http://talebzadehmich.wordpress.com > =20 > Publications due shortly: > Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache > =20 > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this mes= sage > shall not be understood as given or endorsed by Peridale Ltd, its subsidi= aries > or their employees, unless expressly so stated. It is the responsibility = of > the recipient to ensure that this email is virus free, therefore neither > Peridale Ltd, its subsidiaries nor their employees accept any responsibil= ity. > =20 --B_3510509157_4492816 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
Hi Lefty,

Couldn’t find the documentation for what hive.groupby.orderby.po= sition.alias=3Dtrue does.

I suspect that might be wha= t Mich was looking for (though I tend to write the column names explicitly).=

Cheers,
Gopal

From: Lefty Leverenz <leftyleverenz@gmail.com>
Reply-To: "user@hive= .apache.org" <user@hive.apache.= org>
Date: Sunday, March 29= , 2015 at 8:32 PM
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: Re: ORDER BY clause in Hive

<= div>
=
I added information about this in the Order By section of the wiki.  Thanks, Mich and Gopal!

-- Lefty

On Sun, Mar 2= 9, 2015 at 5:24 AM, Mich Talebzadeh <mich@peridale.co.u= k> wrote:

Hi Lefty, Gopal,

=

 

It appears that ORDER BY 3 is= not interpreted as ORDERR BY “the result set column three” whic= h standard SQL as evident from Oracle and Sybase does.<= /p>

 

So I made it an ORDER BY f= rom the result set EXPLICITELY as shown below and it worked OK=

 

SELECT=

      &= nbsp;   rs.Customer_ID

&n= bsp;       , rs.Number_of_orders=

        , rs.= Total_customer_amount

  &= nbsp;     , rs.Average_order

        , rs.Standard_devia= tion

FROM

<= span class=3D"">

(

&n= bsp;       SELECT cust_id AS Customer_ID,

      &nb= sp; COUNT(amount_sold) AS Number_of_orders,

        SUM(amount_sold) AS Tota= l_customer_amount,

  &nbs= p;     AVG(amount_sold) AS Average_order,<= /span>

        STDDEV= (amount_sold) AS Standard_deviation

        FROM sales<= /p>

        GROUP BY cust= _id

     &= nbsp;  HAVING SUM(amount_sold) > 94000

        AND AVG(amount_sold) = < STDDEV(amount_sold)

) r= s

ORDER BY

=

;<= u>

 

 

+-----------------+--------------= --------+---------------------------+---------------------+-----------------= -------+--+

| rs.customer_id  = | rs.number_of_orders  | rs.total_customer_amount  |  rs.aver= age_order   | rs.standard_deviation  |

+-----------------+----------------------+---------------= ------------+---------------------+------------------------+--+

| 1743.0    =       | 238      &nbs= p;           | 94786.12999= 999993         | 398.2610504201678&n= bsp;  | 581.0439095219863      |=

| 429.0       = ;    | 231        &nb= sp;         | 94819.41000000006 = ;        | 410.4736363636366   = | 613.7057080691426      |

=

| 6395.0          | 268 = ;            &nb= sp;    | 97010.47999999998      = ;   | 361.97940298507456  | 576.9120977984521  &nbs= p;   |

| 12783.0 &nb= sp;       | 240     &= nbsp;            | 97= 573.54999999996         | 406.556458= 3333332   | 590.4445500393804      |

| 4974.0     &= nbsp;    | 235        = ;          | 98006.16000000002&= nbsp;        | 417.0474893617022 &nb= sp; | 624.337482834059       |

| 42167.0       = ;  | 266          &nb= sp;       | 98585.96000000002   = ;      | 370.6239097744362   | 590.965120= 684093       |

| 10747.0         | 256&nb= sp;            &= nbsp;    | 99578.08999999997     &nb= sp;   | 388.9769140624999   | 600.7615005975689 &nb= sp;    |

| 11407.0         | 248 &nb= sp;            &= nbsp;   | 103412.65999999995      &n= bsp; | 416.9865322580643   | 622.221465710723   &nb= sp;   |

+---------= --------+----------------------+---------------------------+----------------= -----+------------------------+--+

= 9 rows selected (209.699 seconds)

&nb= sp;

 

Regards,=

 <= /p>

Mich Talebzadeh

 

http://talebzadehmich.wordpr= ess.com

 

Publications due shortly:

Creating in-memory Data Grid for Tradin= g Systems with Oracle TimesTen and Coherence Cache<= /p>

 

NOTE: The information in this email is proprie= tary and confidential. This message is for the designated recipient only, if= you are not the intended recipient, you should destroy it immediately. Any information in this message shall not b= e understood as given or endorsed by Peridale Ltd, its subsidiaries or their= employees, unless expressly so stated. It is the responsibility of the reci= pient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their = employees accept any responsibility.=

 

<= div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm = 0cm">

From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: OR= DER BY clause in Hive

 

Hi,

 

Can someone point me = to doc or otherwise to see if ORDER BY clause in Hive is working OK

 

= I have a simp= le aggregate query as follows:

=  =

SELECT cust_id AS Customer_ID,=

COUNT(amount_sold) AS Number_of_orders,<= /u>

SUM(amount_sold) AS Total_customer_am= ount,

AVG(amount_sold) AS Average_= order,

STDDEV(amount_sold) AS Stan= dard_deviation

FROM sales

GROUP BY cust_id

<= p class=3D"MsoNormal">HAVING SUM(amount_sold) > 94000

=

ORDER BY 3 ;

<= u> 

The original table and data are from O= racle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER= _OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER STANDARD_DEVIATION

----------- ---------------- ---------------------= ------------- ------------------

&= nbsp;      1743      =         238     =          94786.13   &= nbsp; 398.26105          582.26= 845

     &= nbsp;  429          &= nbsp;   231         &= nbsp;    94819.41    410.473636  &nb= sp;      615.038404

       2994    &= nbsp;         227   &= nbsp;          94862.61 &n= bsp;   417.89696        &n= bsp; 624.53793

   &n= bsp;   6395         &= nbsp;    268        &= nbsp;     97010.48    361.979403 &nb= sp;       577.991448

<= p class=3D"MsoNormal">      12783    =           240   =            97573.55 &= nbsp;  406.556458         =   591.6785

   &= nbsp;   4974         =      235        =       98006.16    417.047489 &n= bsp;       625.670115

=

    = ;  10747          &nb= sp;   256         &nb= sp;    99578.09    388.976914   = ;      601.938312

      11407     &n= bsp;        248    &n= bsp;        103412.66    4= 16.986532         623.479751<= u>

 

9 rows selected.

 <= /span>

Ordered by TOTAL_CUSTOMER_AMOUNT

<= p class=3D"MsoNormal"> 

And hive returns for the same = query

 

+--------------+-------------------+------------------------+-----= ----------------+---------------------+--+

| customer_id  | number_of_orders  | total_customer_amount  |    average_order &nb= sp;  | standard_deviation  |

+--------------+-------------------+------------------------+-----------= ----------+---------------------+--+

| 12783.0&= nbsp;     | 240       = ;        | 97573.54999999996  &= nbsp;   | 406.5564583333332   | 590.4445500393804 &= nbsp; |

| 11407.0   =    | 248         &nbs= p;     | 103412.65999999995     | 41= 6.9865322580643   | 622.221465710723    |=

| 10747.0      | 25= 6            &nb= sp;  | 99578.08999999997      | 388.9769140624= 999   | 600.7615005975689   |

| 6395.0       | 268  = ;             | = 97010.47999999998      | 361.97940298507456  |= 576.9120977984521   |

| = 4974.0       | 235    &nbs= p;          | 98006.16000000002=       | 417.0474893617022   | 624.3374828= 34059    |

| 2994.0&= nbsp;      | 227      = ;         | 94862.61000000006 &= nbsp;    | 417.89696035242315  | 623.1607772763742 =   |

| 1743.0   =     | 238        &nbs= p;      | 94786.12999999993    =   | 398.2610504201678   | 581.0439095219863   |<= /u>

| 429.0     &= nbsp;  | 231          = ;     | 94819.41000000006      = | 410.4736363636366   | 613.7057080691426   |<= /u>

+--------------+-------------------+---------= ---------------+---------------------+---------------------+--+

9 rows selected (215.774 seconds)=

 

But ordering in Hi= ve does not seem to be correct! Please note ordering is on column three, total_customer_amount

<= span style=3D"font-size: 10pt; font-family: Arial, sans-serif;"> <= u>

I also tried this in Sybase and got the same as Ora= cle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID          Nu= mber_of_orders Total_customer_amount      &nbs= p;            &n= bsp; Average_order         &nbs= p;            &n= bsp;            =      Standard_deviation

-------------------- ---------------- -----------------------------= ------------ ---------------------------------------------------- ----------= -----------------

   = ;            &nb= sp;1743           &nb= sp;  238          &nb= sp;            &= nbsp;          94786.13 &n= bsp;            =             &nbs= p;         398.2610504201680 &n= bsp;            =     582.268450

<= span style=3D"font-size: 8pt; font-family: 'Courier New'; color: black;"> = ;            &nb= sp;    429        &nb= sp;     231       &nb= sp;            &= nbsp;            = ; 94819.41           =              &nb= sp;           410.473= 6363636363           =        615.038404

          &nb= sp;      2994      &n= bsp;       227     &n= bsp;            =             &nbs= p;   94862.61         = ;            &nb= sp;            &= nbsp; 417.8969603524229         = ;         624.537930

        &n= bsp;        6395    &= nbsp;         268   &= nbsp;            = ;            &nb= sp;     97010.48      &nbs= p;            &n= bsp;            =     361.9794029850746      &nbs= p;           577.991448=

      &= nbsp;         12783   = ;           240  = ;            &nb= sp;            &= nbsp;      97573.55     &n= bsp;            =             &nbs= p;     406.5564583333333     &n= bsp;            591.6= 78500

     = ;            4974&nbs= p;             2= 35            &n= bsp;            =          98006.16   &= nbsp;            = ;            &nb= sp;       417.0474893617021   &= nbsp;            = ;  625.670115

  &nbs= p;             4= 2167            =   266           =              &nb= sp;         98585.96  = ;            &nb= sp;            &= nbsp;        370.6239097744360  = ;            &nb= sp;   592.079099

 &n= bsp;            =   10747          &nbs= p;   256         &nbs= p;            &n= bsp;           99578.09&nb= sp;            &= nbsp;            = ;          388.9769140625000&nb= sp;             =     601.938312

=             &nbs= p;   11407         &n= bsp;    248        &n= bsp;            =             103412.66=             &nbs= p;            &n= bsp;          416.9865322580645=             &nbs= p;     623.479751

 

I tried Google search and seems to= be different suggestions. May be I have to rewrite the code?<= /span>

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com<= /p>

 

<= u>Publications due shortly:<= u>

= Creating in-memory Data Grid for Trading Systems with Oracle TimesTen an= d Coherence Cache

 

NOT= E: The information in this email is proprietary and confidential. This messa= ge is for the designated recipient only, if you are not the intended recipie= nt, you should destroy it immediately. Any information in this message shall not b= e understood as given or endorsed by Peridale Ltd, its subsidiaries or their= employees, unless expressly so stated. It is the responsibility of the reci= pient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their = employees accept any responsibility.=

 

<= /div>

--B_3510509157_4492816--