hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: Question on Implementing CASE in Hive Join
Date Tue, 03 May 2016 14:01:25 GMT
I’m not sure I understand the question since the recent query is not using a table primary
key but generating on using row_number()

From: Kishore A [mailto:kishore.atmakuri@gmail.com]
Sent: Tuesday, May 03, 2016 4:08 PM
To: user@hive.apache.org
Subject: Re: Question on Implementing CASE in Hive Join

Thank you for your reply on this.

Your recent query is working perfectly.

Yes, we don't have primary key here in this case. Are there some other ways of achieving the
same output when we don't have a primary key?

On Wed, Apr 27, 2016 at 12:11 PM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
In order to achieve the LEFT JOIN I’m generating a primary key to table A using ROW_NUMBER.
If you already have a primary key, you can use it and skip the WITH clause.

Dudu


with        a_rn as (select row_number () over () as rn,* from a)



select      *



from                    a_rn as a



            left join  (select      a.rn

                                   ,b.*



                        from                    a_rn as a



                                    cross join  b



                        where       a.type      = b.type



                                and a.code      like case b.code    when 'All' then '%' else
b.code     end

                                and a.country   like case b.country when 'All' then '%' else
b.country  end

                                and a.flag      like case b.flag    when 'All' then '%' else
b.flag     end

                        )

                        as b



            on          b.rn =

                        a.rn

;

1              101         USA       OAS       Y              1              101         All
          OAS       All           John
2              142         US          OAS       X             2              142        
All           OAS       All           John
3              101         US          Retail     X             3              All       
   All           Retail     All           John
4              142         USA       MO        X             NULL     NULL     NULL     NULL
    NULL     NULL
5              192         US          Mod      X             5              192         All
          Mod      All           Sean
6              101         USA       Sal          X             NULL     NULL     NULL   
 NULL     NULL     NULL
7              101         USA       Win        X             NULL     NULL     NULL     NULL
    NULL     NULL
8              101         USA       Feed      X             NULL     NULL     NULL     NULL
    NULL     NULL
9              142         CAN       Feed      X             NULL     NULL     NULL     NULL
    NULL     NULL
10           101         USA       OCP       X             NULL     NULL     NULL     NULL
    NULL     NULL
11           101         USA       STORE   X             NULL     NULL     NULL     NULL 
   NULL     NULL
12           129         USA       Tax         Y              NULL     NULL     NULL     NULL
    NULL     NULL
13           119         USA       Tax         Y              NULL     NULL     NULL     NULL
    NULL     NULL
14           101         USA       Pus         Y              NULL     NULL     NULL     NULL
    NULL     NULL
15           142         CN          Pus         Y              15           142         All
          Pus         Y              Smith
16           142         CA          Pus         Y              16           142         All
          Pus         Y              Smith
17           142         US          Pus         S              NULL     NULL     NULL   
 NULL     NULL     NULL
18           101         CN          Tax         Y              18           101         All
          Tax         Y              Smith
19           101         CA          Tax         Y              19           101         All
          Tax         Y              Smith
20           101         USA       Tax         Y              20           101         All
          Tax         Y              Smith
21           101         US          Tax         X             21           101         All
          Tax         X             John

From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Wednesday, April 27, 2016 9:15 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Question on Implementing CASE in Hive Join


I feel the mail at your end has got a different format.

Remove * and - in the data what you see...That is my final data....You can correlate with
B

2. I need A LeftJoin B
On Apr 27, 2016 11:41 PM, "Markovitz, Dudu" <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Huston, we have a problem ☺

1.
Data

Here are the tables with tabs replaced by ‘*’ and spaces replaced by ‘-’.
A is a mismatch of tabs and spaces and B contains only (a very strange number of) spaces.

So you’re using TRIM with all A columns, but how exactly have you defined table B?


A
101*---US*------Tax*----X
101*---USA*------Tax*----Y
101*---CA*------Tax*----Y
101*---CN*------Tax*----Y
142*---US*------Pus*----S
142*---CA*------Pus*----Y
142*---CN*------Pus*----Y
101*---USA*------Pus*----Y
119*---USA*------Tax*----Y
129*---USA*------Tax*----Y
101*---USA*------STORE*----X
101*---USA*------OCP*----X
142*---CAN*------Feed*----X
101*---USA*------Feed*----X
101*---USA*------Win*----X
101*---USA*------Sal*----X
192*---US*------Mod*----X
142*---USA*------MO*----X
101*---US*------Retail*X
142*---US*------OAS*----X
101*---USA*------OAS*----Y

B
All--------All-----------Retail------All----John
All-------US----------OM----------X------John
ALL-----US---------OM----------Y------Smith
101-----All----------OAS---------All----John
142-----All----------OAS---------All----John
192-----All----------Mod----------All----Sean
101-----All----------Tax-----------X------John
101-----All---------Tax------------Y------Smith
142-----US---------Pus----------X------John
142-----All----------Pus----------Y------Smith--


2.
Requirement

“A left join B” guarantees that all the records from A appear in the result set and that
the number of rows in the result set is at least the number of rows of A.
You should take into consideration that a record in A might have several matches from B.

Also –
In your original question you’ve talked about “B LEFT JOIN A” and now it’s “A LEFT
JOIN B”.

3.
Scenario  1

I don’t want to start a lecture at that point, but ‘ON‘ and ‘WHERE’ have a completely
different meaning and use.
You’re conditions in the WHERE clause turned the LEFT JOIN to INNER JOIN.

4.
Scenario 2

I’ve suggested ‘LIKE’ not ‘=’ (but it won’t work because of Hive limitations that
require equality conditions for JOIN).

5.
Scenario 3

CROSS JOIN does not use ON (Hive lets you do that but it not an SQL standard and it’s actually
an INNER JOIN).

6.
CASE

CASE is defined by ANSI/ISO and works in Hive the same way it works in HQL the same way it
works in any other SQL dialect.

In conclusion -
The solution I gave you in the previous mails works as expected.
You should check your data and your tables’ definitions.

Just tell me if you want “B LEFT JOIN A” or “A LEFT JOIN B” and I’ll send you the
exact code.

Dudu

From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Wednesday, April 27, 2016 6:30 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Question on Implementing CASE in Hive Join

Hello Dudu,

Thank you for previous reply. I tried all the scenarios but nothing is working out. I need
this condition with with minimal steps like by using case when.

Please find the below


TableA
Code   Country    Type     Flag
101        US         Tax               X
101        USA      Tax               Y
101        CA        Tax               Y
101        CN        Tax               Y
142        US         Pus    S
142        CA        Pus    Y
142        CN        Pus    Y
101        USA      Pus    Y
119        USA      Tax               Y
129        USA      Tax               Y
101        USA      STORE        X
101        USA      OCP             X
142        CAN      Feed             X
101        USA      Feed             X
101        USA      Win              X
101        USA      Sal    X
192        US         Mod             X
142        USA      MO               X
101        US         Retail         X
142        US         OAS             X
101        USA      OAS             Y

TableB
Code   Country    Type     Flag   Value
All        All           Retail      All    John
All       US          OM          X      John
ALL     US         OM          Y      Smith
101     All          OAS         All    John
142     All          OAS         All    John
192     All          Mod          All    Sean
101     All          Tax           X      John
101     All         Tax            Y      Smith
142     US         Pus          X      John
142     All          Pus          Y      Smith


This is how my data in reality looks like. Type column is the fixed column here

Requirement : TableA LeftJoin TableB , stamp the matching Value from TableB. Total result
should have the number of rows as TableA

Scenario  1:

Select * FROM TableA a
    Left Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    WHERE LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN LOWER(TRIM(a.Code))
ELSE LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN LOWER(TRIM(a.country))
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN LOWER(TRIM(a.flag))
ELSE LOWER(TRIM(b.flag)) END);

on the output the number of rows should be same as the left table, but as I am apply WHERE
data is getting filtered out and is giving only
matching records



Scenario 2: Suggested by you

Select * FROM TableA a
    Left Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN '%' ELSE
LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN '%'
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN '%'    
       ELSE LOWER(TRIM(b.flag)) END);

As per the left join this is giving me the number of rows just as the left side table but
I am not having the matching records as per the
right table. I have NULLs on the right side.

Scenario 3: Cross Join

Select * FROM TableA a
    CROSS Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN '%' ELSE
LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN '%'
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN '%'    
       ELSE LOWER(TRIM(b.flag)) END);

Giving same result as Scenario 2 - All the left columns with NULLs on right

I don't know if we still need to fine tune the query as Hive is not supporting CASE WHEN just
as in SQL. Your thoughts are more helpful to me.

Kishore


On Wed, Apr 20, 2016 at 12:04 PM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
The second version works as expected (after fixing a typo in the word ‘indicator’).
If you don’t get any results you should check your data (maybe the fields contains trailing
spaces or control characters etc.).

If you’re willing to replace the ‘OUTER’ with ‘INNER’, there’s another option
-

select      *

from                    b

            cross join  a

where       a.type      = b.type
        and a.code      like case b.code      when 'ALL' then '%' else b.code      end
        and a.indicator like case b.indicator when 'ALL' then '%' else b.indicator end
;

Dudu


From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Wednesday, April 20, 2016 5:04 PM

To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Question on Implementing CASE in Hive Join

Hi Dudu,

Thank you for sending queries around this.

I have run these queries and below are the observations

1. It did return the same error as before" SemanticException [Error 10017]: Line 4:4 Both
left and right aliases encountered in JOIN 'code'"

2. Query execution is successful but not retrieving any results out of it.

I am clueless and not able to proceed to next step until this is resolved. Do you have any
other suggestions please?

Kishore

On Tue, Apr 19, 2016 at 6:08 AM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Please try the following two options.
Option 2 might be better, performance wise (depending of the data volume and characteristics).

P.s.
I didn’t understand the explanation about the LEFT JOIN


Dudu

1.

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      like case b.code       when 'ALL' then '%' else b.code
      end
                    and a.indicator like case b.indicatior when 'ALL' then '%' else b.indicatior
end
;



2.

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      = b.code
                    and a.indicator = b.indicatior

where       b.code       != 'ALL'
        and b.indicatior != 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.indicator = b.indicatior

where       b.code        = 'ALL'
        and b.indicatior != 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      = b.code

where       b.code       != 'ALL'
        and b.indicatior  = 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type

where       b.code       = 'ALL'
        and b.indicatior = 'ALL'
;


From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Tuesday, April 19, 2016 3:51 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Question on Implementing CASE in Hive Join

Hi Dudu,

Actually we use both fields from left and right tables, I mentioned right table just for my
convenience to check whether ALL from right table can be pulled as per join condition match.

One more reason why we use left join is we should not have extra columns after join.

Kishore



On Tue, Apr 19, 2016 at 5:46 AM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Before dealing with the technical aspect, can you please explain what is the point of using
LEFT JOIN without selecting any field from table A?

Thanks

Dudu

From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Tuesday, April 19, 2016 2:29 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Question on Implementing CASE in Hive Join

Hi,

I have a scenario to implement to cases in Hive Joins. I need to implement case on the value
on which join condition to be applied.

Table A
Code// Type// Indicator// Value//
A      1      XYZ         John
B      1      PQR         Smith
C      2      XYZ         John
C      2      PQR         Smith
D      3      PQR         Smith
E      3      XYZ         Smith
F      4      MNO         Smith
G      3      MNO         Smith
...

Mime
View raw message