hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: Question on Implementing CASE in Hive Join
Date Wed, 27 Apr 2016 18:10:40 GMT
Huston, we have a problem ☺

1.
Data

Here are the tables with tabs replaced by ‘*’ and spaces replaced by ‘-’.
A is a mismatch of tabs and spaces and B contains only (a very strange number of) spaces.

So you’re using TRIM with all A columns, but how exactly have you defined table B?


A
101*---US*------Tax*----X
101*---USA*------Tax*----Y
101*---CA*------Tax*----Y
101*---CN*------Tax*----Y
142*---US*------Pus*----S
142*---CA*------Pus*----Y
142*---CN*------Pus*----Y
101*---USA*------Pus*----Y
119*---USA*------Tax*----Y
129*---USA*------Tax*----Y
101*---USA*------STORE*----X
101*---USA*------OCP*----X
142*---CAN*------Feed*----X
101*---USA*------Feed*----X
101*---USA*------Win*----X
101*---USA*------Sal*----X
192*---US*------Mod*----X
142*---USA*------MO*----X
101*---US*------Retail*X
142*---US*------OAS*----X
101*---USA*------OAS*----Y

B
All--------All-----------Retail------All----John
All-------US----------OM----------X------John
ALL-----US---------OM----------Y------Smith
101-----All----------OAS---------All----John
142-----All----------OAS---------All----John
192-----All----------Mod----------All----Sean
101-----All----------Tax-----------X------John
101-----All---------Tax------------Y------Smith
142-----US---------Pus----------X------John
142-----All----------Pus----------Y------Smith--


2.
Requirement

“A left join B” guarantees that all the records from A appear in the result set and that
the number of rows in the result set is at least the number of rows of A.
You should take into consideration that a record in A might have several matches from B.

Also –
In your original question you’ve talked about “B LEFT JOIN A” and now it’s “A LEFT
JOIN B”.

3.
Scenario  1

I don’t want to start a lecture at that point, but ‘ON‘ and ‘WHERE’ have a completely
different meaning and use.
You’re conditions in the WHERE clause turned the LEFT JOIN to INNER JOIN.

4.
Scenario 2

I’ve suggested ‘LIKE’ not ‘=’ (but it won’t work because of Hive limitations that
require equality conditions for JOIN).

5.
Scenario 3

CROSS JOIN does not use ON (Hive lets you do that but it not an SQL standard and it’s actually
an INNER JOIN).

6.
CASE

CASE is defined by ANSI/ISO and works in Hive the same way it works in HQL the same way it
works in any other SQL dialect.

In conclusion -
The solution I gave you in the previous mails works as expected.
You should check your data and your tables’ definitions.

Just tell me if you want “B LEFT JOIN A” or “A LEFT JOIN B” and I’ll send you the
exact code.

Dudu

From: Kishore A [mailto:kishore.atmakuri@gmail.com]
Sent: Wednesday, April 27, 2016 6:30 PM
To: user@hive.apache.org
Subject: Re: Question on Implementing CASE in Hive Join

Hello Dudu,

Thank you for previous reply. I tried all the scenarios but nothing is working out. I need
this condition with with minimal steps like by using case when.

Please find the below


TableA
Code   Country    Type     Flag
101        US         Tax               X
101        USA      Tax               Y
101        CA        Tax               Y
101        CN        Tax               Y
142        US         Pus    S
142        CA        Pus    Y
142        CN        Pus    Y
101        USA      Pus    Y
119        USA      Tax               Y
129        USA      Tax               Y
101        USA      STORE        X
101        USA      OCP             X
142        CAN      Feed             X
101        USA      Feed             X
101        USA      Win              X
101        USA      Sal    X
192        US         Mod             X
142        USA      MO               X
101        US         Retail         X
142        US         OAS             X
101        USA      OAS             Y

TableB
Code   Country    Type     Flag   Value
All        All           Retail      All    John
All       US          OM          X      John
ALL     US         OM          Y      Smith
101     All          OAS         All    John
142     All          OAS         All    John
192     All          Mod          All    Sean
101     All          Tax           X      John
101     All         Tax            Y      Smith
142     US         Pus          X      John
142     All          Pus          Y      Smith


This is how my data in reality looks like. Type column is the fixed column here

Requirement : TableA LeftJoin TableB , stamp the matching Value from TableB. Total result
should have the number of rows as TableA

Scenario  1:

Select * FROM TableA a
    Left Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    WHERE LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN LOWER(TRIM(a.Code))
ELSE LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN LOWER(TRIM(a.country))
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN LOWER(TRIM(a.flag))
ELSE LOWER(TRIM(b.flag)) END);

on the output the number of rows should be same as the left table, but as I am apply WHERE
data is getting filtered out and is giving only
matching records



Scenario 2: Suggested by you

Select * FROM TableA a
    Left Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN '%' ELSE
LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN '%'
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN '%'    
       ELSE LOWER(TRIM(b.flag)) END);

As per the left join this is giving me the number of rows just as the left side table but
I am not having the matching records as per the
right table. I have NULLs on the right side.

Scenario 3: Cross Join

Select * FROM TableA a
    CROSS Join TableB b ON
    LOWER(TRIM(a.Type)) = LOWER(TRIM(b.Type))
    AND LOWER(TRIM(a.Code)) = (CASE WHEN LOWER(TRIM(b.Code)) = LOWER('All') THEN '%' ELSE
LOWER(TRIM(b.Code)) END)
    AND LOWER(TRIM(a.country)) = (CASE WHEN LOWER(TRIM(b.country)) = LOWER('All') THEN '%'
ELSE LOWER(TRIM(b.country)) END)
    AND LOWER(TRIM(a.flag))  = (CASE WHEN LOWER(TRIM(b.flag)) = LOWER('All') THEN '%'    
       ELSE LOWER(TRIM(b.flag)) END);

Giving same result as Scenario 2 - All the left columns with NULLs on right

I don't know if we still need to fine tune the query as Hive is not supporting CASE WHEN just
as in SQL. Your thoughts are more helpful to me.

Kishore


On Wed, Apr 20, 2016 at 12:04 PM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
The second version works as expected (after fixing a typo in the word ‘indicator’).
If you don’t get any results you should check your data (maybe the fields contains trailing
spaces or control characters etc.).

If you’re willing to replace the ‘OUTER’ with ‘INNER’, there’s another option
-

select      *

from                    b

            cross join  a

where       a.type      = b.type
        and a.code      like case b.code      when 'ALL' then '%' else b.code      end
        and a.indicator like case b.indicator when 'ALL' then '%' else b.indicator end
;

Dudu


From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Wednesday, April 20, 2016 5:04 PM

To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Question on Implementing CASE in Hive Join

Hi Dudu,

Thank you for sending queries around this.

I have run these queries and below are the observations

1. It did return the same error as before" SemanticException [Error 10017]: Line 4:4 Both
left and right aliases encountered in JOIN 'code'"

2. Query execution is successful but not retrieving any results out of it.

I am clueless and not able to proceed to next step until this is resolved. Do you have any
other suggestions please?

Kishore

On Tue, Apr 19, 2016 at 6:08 AM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Please try the following two options.
Option 2 might be better, performance wise (depending of the data volume and characteristics).

P.s.
I didn’t understand the explanation about the LEFT JOIN


Dudu

1.

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      like case b.code       when 'ALL' then '%' else b.code
      end
                    and a.indicator like case b.indicatior when 'ALL' then '%' else b.indicatior
end
;



2.

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      = b.code
                    and a.indicator = b.indicatior

where       b.code       != 'ALL'
        and b.indicatior != 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.indicator = b.indicatior

where       b.code        = 'ALL'
        and b.indicatior != 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type
                    and a.code      = b.code

where       b.code       != 'ALL'
        and b.indicatior  = 'ALL'

union all

select      b.code
           ,b.value

from                    b

            left join   a

            on          a.type      = b.type

where       b.code       = 'ALL'
        and b.indicatior = 'ALL'
;


From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Tuesday, April 19, 2016 3:51 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Question on Implementing CASE in Hive Join

Hi Dudu,

Actually we use both fields from left and right tables, I mentioned right table just for my
convenience to check whether ALL from right table can be pulled as per join condition match.

One more reason why we use left join is we should not have extra columns after join.

Kishore



On Tue, Apr 19, 2016 at 5:46 AM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Before dealing with the technical aspect, can you please explain what is the point of using
LEFT JOIN without selecting any field from table A?

Thanks

Dudu

From: Kishore A [mailto:kishore.atmakuri@gmail.com<mailto:kishore.atmakuri@gmail.com>]
Sent: Tuesday, April 19, 2016 2:29 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Question on Implementing CASE in Hive Join

Hi,

I have a scenario to implement to cases in Hive Joins. I need to implement case on the value
on which join condition to be applied.

Table A
Code// Type// Indicator// Value//
A      1      XYZ         John
B      1      PQR         Smith
C      2      XYZ         John
C      2      PQR         Smith
D      3      PQR         Smith
E      3      XYZ         Smith
F      4      MNO         Smith
G      3      MNO         Smith
D      1      XYZ         John
N      3      STR         Smith


Table B
Code// Type// Indicator// Value//
ALL    1      XYZ         John
D        3      ALL         Smith
ALL    1      PQR         Smith

I need to stamp Value from TableB by joining TableA and I am writing join condition as below.
Note : No instance of ALL for Type column, a value for Type will be provided.

Select b.Code,b.Value from B
LEFT JOIN A a ON
a.Code = (case when b.Code = 'ALL' then a.Code else b.Code END)
AND
a.Type = b.Type
AND
a.Indicator = (case when b.Indicatior = 'ALL' then a.Inidicator else b.Inidicator END)

When I run this in hive this query is failing with below error
Error while compiling statement: FAILED: SemanticException [Error 10017]: Line 4:0 Both left
and right aliases encountered in JOIN 'Code'.


Please let me know if more details are needed

Thanks,
Kishore




Mime
View raw message