hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kishore A <kishore.atmak...@gmail.com>
Subject Re: Question on Implementing CASE in Hive Join
Date Tue, 03 May 2016 13:07:53 GMT
Thank you for your reply on this.

Your recent query is working perfectly.

Yes, we don't have primary key here in this case. Are there some other ways
of achieving the same output when we don't have a primary key?

On Wed, Apr 27, 2016 at 12:11 PM, Markovitz, Dudu <dmarkovitz@paypal.com>
wrote:

> In order to achieve the LEFT JOIN I’m generating a primary key to table A
> using ROW_NUMBER.
>
> If you already have a primary key, you can use it and skip the WITH clause.
>
>
>
> Dudu
>
>
>
> *with*        a_rn *as* (*select* row_number () over () *as* rn,* *from* a)
>
>
>
> *select*      *
>
>
>
> *from*                    a_rn *as* a
>
>
>
>             *left* *join*  (*select*      a.rn
>
>                                    ,b.*
>
>
>
>                         *from*                    a_rn *as* a
>
>
>
>                                     *cross* *join*  b
>
>
>
>                         *where*       a.*type*      = b.*type*
>
>
>
>                                 *and* a.code      like *case* b.code    *when* 'All'
*then* '%' *else* b.code     *end*
>
>                                 *and* a.country   like *case* b.country *when* 'All'
*then* '%' *else* b.country  *end*
>
>                                 *and* a.flag      like *case* b.flag    *when* 'All'
*then* '%' *else* b.flag     *end*
>
>                         )
>
>                         *as* b
>
>
>
>             *on*          b.rn =
>
>                         a.rn
>
> *;*
>
>
>
> 1              101         USA       OAS       Y
> 1              101         All           OAS       All           John
>
> 2              142         US          OAS       X
> 2              142         All           OAS       All           John
>
> 3              101         US          Retail     X
> 3              All           All           Retail     All           John
>
> 4              142         USA       MO        X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 5              192         US          Mod      X
> 5              192         All           Mod      All           Sean
>
> 6              101         USA       Sal          X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 7              101         USA       Win        X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 8              101         USA       Feed      X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 9              142         CAN       Feed      X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 10           101         USA       OCP       X             NULL
> NULL     NULL     NULL     NULL     NULL
>
> 11           101         USA       STORE   X             NULL     NULL
> NULL     NULL     NULL     NULL
>
> 12           129         USA       Tax         Y              NULL
> NULL     NULL     NULL     NULL     NULL
>
> 13           119         USA       Tax         Y              NULL
> NULL     NULL     NULL     NULL     NULL
>
> 14           101         USA       Pus         Y              NULL
> NULL     NULL     NULL     NULL     NULL
>
> 15           142         CN          Pus         Y
> 15           142         All           Pus         Y              Smith
>
> 16           142         CA          Pus         Y
> 16           142         All           Pus         Y              Smith
>
> 17           142         US          Pus         S              NULL
> NULL     NULL     NULL     NULL     NULL
>
> 18           101         CN          Tax         Y
> 18           101         All           Tax         Y              Smith
>
> 19           101         CA          Tax         Y
> 19           101         All           Tax         Y              Smith
>
> 20           101         USA       Tax         Y              20
> 101         All           Tax         Y              Smith
>
> 21           101         US          Tax         X
> 21           101         All           Tax         X             John
>
>
>
> *From:* Kishore A [mailto:kishore.atmakuri@gmail.com]
> *Sent:* Wednesday, April 27, 2016 9:15 PM
> *To:* user@hive.apache.org
> *Subject:* RE: Question on Implementing CASE in Hive Join
>
>
>
> I feel the mail at your end has got a different format.
>
> Remove * and - in the data what you see...That is my final data....You can
> correlate with B
>
> 2. I need A LeftJoin B
>
> On Apr 27, 2016 11:41 PM, "Markovitz, Dudu" <dmarkovitz@paypal.com> wrote:
>
> Huston, we have a problem J
>
>
>
> 1.
>
> Data
>
>
>
> Here are the tables with tabs replaced by ‘*’ and spaces replaced by ‘-’.
>
> A is a mismatch of tabs and spaces and B contains only (a very strange
> number of) spaces.
>
>
>
> So you’re using TRIM with all A columns, but how exactly have you defined
> table B?
>
>
>
>
>
> *A*
>
> 101*---US*------Tax*----X
>
> 101*---USA*------Tax*----Y
>
> 101*---CA*------Tax*----Y
>
> 101*---CN*------Tax*----Y
>
> 142*---US*------Pus*----S
>
> 142*---CA*------Pus*----Y
>
> 142*---CN*------Pus*----Y
>
> 101*---USA*------Pus*----Y
>
> 119*---USA*------Tax*----Y
>
> 129*---USA*------Tax*----Y
>
> 101*---USA*------STORE*----X
>
> 101*---USA*------OCP*----X
>
> 142*---CAN*------Feed*----X
>
> 101*---USA*------Feed*----X
>
> 101*---USA*------Win*----X
>
> 101*---USA*------Sal*----X
>
> 192*---US*------Mod*----X
>
> 142*---USA*------MO*----X
>
> 101*---US*------Retail*X
>
> 142*---US*------OAS*----X
>
> 101*---USA*------OAS*----Y
>
>
>
> *B*
>
> All--------All-----------Retail------All----John
>
> All-------US----------OM----------X------John
>
> ALL-----US---------OM----------Y------Smith
>
> 101-----All----------OAS---------All----John
>
> 142-----All----------OAS---------All----John
>
> 192-----All----------Mod----------All----Sean
>
> 101-----All----------Tax-----------X------John
>
> 101-----All---------Tax------------Y------Smith
>
> 142-----US---------Pus----------X------John
>
> 142-----All----------Pus----------Y------Smith--
>
>
>
>
>
> 2.
>
> Requirement
>
>
>
> “A left join B” guarantees that all the records from A appear in the
> result set and that the number of rows in the result set is *at least*
> the number of rows of A.
>
> You should take into consideration that a record in A might have several
> matches from B.
>
>
>
> Also –
>
> In your original question you’ve talked about “B LEFT JOIN A” and now it’s
> “A LEFT JOIN B”.
>
>
>
> 3.
>
> Scenario  1
>
>
>
> I don’t want to start a lecture at that point, but ‘ON‘ and ‘WHERE’ have a
> completely different meaning and use.
>
> You’re conditions in the WHERE clause turned the LEFT JOIN to INNER JOIN.
>
>
>
> 4.
>
> Scenario 2
>
>
>
> I’ve suggested ‘LIKE’ not ‘=’ (but it won’t work because of Hive
> limitations that require equality conditions for JOIN).
>
>
>
> 5.
>
> Scenario 3
>
>
>
> CROSS JOIN does not use ON (Hive lets you do that but it not an SQL
> standard and it’s actually an INNER JOIN).
>
>
>
> 6.
>
> CASE
>
>
>
> CASE is defined by ANSI/ISO and works in Hive the same way it works in HQL
> the same way it works in any other SQL dialect.
>
>
>
> In conclusion -
>
> The solution I gave you in the previous mails works as expected.
>
> You should check your data and your tables’ definitions.
>
>
>
> Just tell me if you want “B LEFT JOIN A” or “A LEFT JOIN B” and I’ll send
> you the exact code.
>
>
>
> Dudu
>
>
>
> *From:* Kishore A [mailto:kishore.atmakuri@gmail.com]
> *Sent:* Wednesday, April 27, 2016 6:30 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Question on Implementing CASE in Hive Join
>
>
>
> Hello Dudu,
>
>
>
> Thank you for previous reply. I tried all the scenarios but nothing is
> working out. I need this condition with with minimal steps like by using
> case when.
>
>
>
> Please find the below
>
>
>
>
>
> TableA
>
> Code   Country    Type     Flag
>
> 101        US         Tax               X
>
> 101        USA      Tax               Y
>
> 101        CA        Tax               Y
>
> 101        CN        Tax               Y
>
> 142        US         Pus    S
>
> 142        CA        Pus    Y
>
> 142        CN        Pus    Y
>
> 101        USA      Pus    Y
>
> 119        USA      Tax               Y
>
> 129        USA      Tax               Y
>
> 101        USA      STORE        X
>
> 101        USA      OCP             X
>
> 142        CAN      Feed             X
>
> 101        USA      Feed             X
>
> 101        USA      Win              X
>
> 101        USA      Sal    X
>
> 192        US         Mod             X
>
> 142        USA      MO               X
>
> 101        US         Retail         X
>
> 142        US         OAS             X
>
> 101        USA      OAS             Y
>
>
>
> TableB
>
> Code   Country    Type     Flag   Value
>
> All        All           Retail      All    John
>
> All       US          OM          X      John
>
> ALL     US         OM          Y      Smith
>
> 101     All          OAS         All    John
>
> 142     All          OAS         All    John
>
> 192     All          Mod          All    Sean
>
> 101     All          Tax           X      John
>
> 101     All         Tax            Y      Smith
>
> 142     US         Pus          X      John
>
> 142     All          Pus          Y      Smith
>
>
>
>
>
> This is how my data in reality looks like. *Type *column is the fixed
> column here
>
>
>
> Requirement : TableA LeftJoin TableB , stamp the matching *Value *from
> TableB. Total result should have the number of rows as TableA
>
>
>
> Scenario  1:
>
>
>
> Select * FROM TableA a
>
>     Left Join TableB b ON
>
>     LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
>
>     WHERE LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) =
> LOWER('All') THEN LOWER(TRIM(a.Code)) ELSE LOWER(TRIM(b.Code)) END)
>
>     AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) =
> LOWER('All') THEN LOWER(TRIM(a.country)) ELSE LOWER(TRIM(b.country)) END)
>
>     AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) =
> LOWER('All') THEN LOWER(TRIM(a.flag)) ELSE LOWER(TRIM(b.flag)) END);
>
>
>
> on the output the number of rows should be same as the left table, but as
> I am apply WHERE data is getting filtered out and is giving only
>
> matching records
>
>
>
>
>
>
>
> Scenario 2: Suggested by you
>
>
>
> Select * FROM TableA a
>
>     Left Join TableB b ON
>
>     LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
>
>     AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) =
> LOWER('All') THEN '%' ELSE LOWER(TRIM(b.Code)) END)
>
>     AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) =
> LOWER('All') THEN '%' ELSE LOWER(TRIM(b.country)) END)
>
>     AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) =
> LOWER('All') THEN '%'            ELSE LOWER(TRIM(b.flag)) END);
>
>
>
> As per the left join this is giving me the number of rows just as the left
> side table but I am not having the matching records as per the
>
> right table. I have NULLs on the right side.
>
>
>
> Scenario 3: Cross Join
>
>
>
> Select * FROM TableA a
>
>     CROSS Join TableB b ON
>
>     LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
>
>     AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) =
> LOWER('All') THEN '%' ELSE LOWER(TRIM(b.Code)) END)
>
>     AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) =
> LOWER('All') THEN '%' ELSE LOWER(TRIM(b.country)) END)
>
>     AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) =
> LOWER('All') THEN '%'            ELSE LOWER(TRIM(b.flag)) END);
>
>
>
> Giving same result as Scenario 2 - All the left columns with NULLs on right
>
>
>
> I don't know if we still need to fine tune the query as Hive is not
> supporting CASE WHEN just as in SQL. Your thoughts are more helpful to me.
>
>
>
> Kishore
>
>
>
>
>
> On Wed, Apr 20, 2016 at 12:04 PM, Markovitz, Dudu <dmarkovitz@paypal.com>
> wrote:
>
> The second version works as expected (after fixing a typo in the word
> ‘indicator’).
>
> If you don’t get any results you should check your data (maybe the fields
> contains trailing spaces or control characters etc.).
>
>
>
> If you’re willing to replace the ‘OUTER’ with ‘INNER’, there’s another
> option -
>
>
>
> *select*      *
>
>
>
> *from*                    b
>
>
>
>             *cross* *join*  a
>
>
>
> *where*       a.*type*      = b.*type*
>
>         *and* a.code      like *case* b.code      *when* 'ALL' *then* '%'
> *else* b.code      *end*
>
>         *and* a.indicator like *case* b.indicator *when* 'ALL' *then* '%'
> *else* b.indicator *end*
>
> *;*
>
>
>
> Dudu
>
>
>
>
>
> *From:* Kishore A [mailto:kishore.atmakuri@gmail.com]
> *Sent:* Wednesday, April 20, 2016 5:04 PM
>
>
> *To:* user@hive.apache.org
> *Subject:* Re: Question on Implementing CASE in Hive Join
>
>
>
> Hi Dudu,
>
>
>
> Thank you for sending queries around this.
>
>
>
> I have run these queries and below are the observations
>
>
>
> 1. It did return the same error as before" SemanticException [Error
> 10017]: Line 4:4 Both left and right aliases encountered in JOIN 'code'"
>
>
>
> 2. Query execution is successful but not retrieving any results out of it.
>
>
>
> I am clueless and not able to proceed to next step until this is resolved.
> Do you have any other suggestions please?
>
>
>
> Kishore
>
>
>
> On Tue, Apr 19, 2016 at 6:08 AM, Markovitz, Dudu <dmarkovitz@paypal.com>
> wrote:
>
> Please try the following two options.
>
> Option 2 might be better, performance wise (depending of the data volume
> and characteristics).
>
>
>
> P.s.
>
> I didn’t understand the explanation about the LEFT JOIN
>
>
>
>
>
> Dudu
>
>
>
> 1.
>
>
>
> *select*      b.code
>
>            ,b.*value*
>
>
>
> *from*                    b
>
>
>
>             *left* *join*   a
>
>
>
>             *on*          a.*type*      = b.*type*
>
>                     *and* a.code      like *case* b.code       *when*
> 'ALL' *then* '%' *else* b.code       *end*
>
>                     *and* a.indicator like *case* b.indicatior *when*
> 'ALL' *then* '%' *else* b.indicatior *end*
>
> *;*
>
>
>
>
>
>
>
> 2.
>
>
>
> *select*      b.code
>
>            ,b.*value*
>
>
>
> *from*                    b
>
>
>
>             *left* *join*   a
>
>
>
>             *on*          a.*type*      = b.*type*
>
>                     *and* a.code      = b.code
>
>                     *and* a.indicator = b.indicatior
>
>
>
> *where*       b.code       != 'ALL'
>
>         *and* b.indicatior != 'ALL'
>
>
>
> *union* *all*
>
>
>
> *select*      b.code
>
>            ,b.*value*
>
>
>
> *from*                    b
>
>
>
>             *left* *join*   a
>
>
>
>             *on*          a.*type*      = b.*type*
>
>                     *and* a.indicator = b.indicatior
>
>
>
> *where*       b.code        = 'ALL'
>
>         *and* b.indicatior != 'ALL'
>
>
>
> *union* *all*
>
>
>
> *select*      b.code
>
>            ,b.*value*
>
>
>
> *from*                    b
>
>
>
>             *left* *join*   a
>
>
>
>             *on*          a.*type*      = b.*type*
>
>                     *and* a.code      = b.code
>
>
>
> *where*       b.code       != 'ALL'
>
>         *and* b.indicatior  = 'ALL'
>
>
>
> *union* *all*
>
>
>
> *select*      b.code
>
>            ,b.*value*
>
>
>
> *from*                    b
>
>
>
>             *left* *join*   a
>
>
>
>             *on*          a.*type*      = b.*type*
>
>
>
> *where*       b.code       = 'ALL'
>
>         *and* b.indicatior = 'ALL'
>
> *;*
>
>
>
>
>
> *From:* Kishore A [mailto:kishore.atmakuri@gmail.com]
> *Sent:* Tuesday, April 19, 2016 3:51 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Question on Implementing CASE in Hive Join
>
>
>
> Hi Dudu,
>
>
>
> Actually we use both fields from left and right tables, I mentioned right
> table just for my convenience to check whether ALL from right table can be
> pulled as per join condition match.
>
>
>
> One more reason why we use left join is we should not have extra columns
> after join.
>
>
>
> Kishore
>
>
>
>
>
>
>
> On Tue, Apr 19, 2016 at 5:46 AM, Markovitz, Dudu <dmarkovitz@paypal.com>
> wrote:
>
> Before dealing with the technical aspect, can you please explain what is
> the point of using LEFT JOIN without selecting any field from table A?
>
>
>
> Thanks
>
>
>
> Dudu
>
>
>
> *From:* Kishore A [mailto:kishore.atmakuri@gmail.com]
> *Sent:* Tuesday, April 19, 2016 2:29 PM
> *To:* user@hive.apache.org
> *Subject:* Question on Implementing CASE in Hive Join
>
>
>
> Hi,
>
>
>
> I have a scenario to implement to cases in Hive Joins. I need to implement
> case on the value on which join condition to be applied.
>
>
>
> Table A
>
> Code// Type// Indicator// Value//
>
> A      1      XYZ         John
>
> B      1      PQR         Smith
>
> C      2      XYZ         John
>
> C      2      PQR         Smith
>
> D      3      PQR         Smith
>
> E      3      XYZ         Smith
>
> F      4      MNO         Smith
>
> G      3      MNO         Smith
>
> ...
>
>

Mime
View raw message