hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramasubramanian Narayanan <ramasubramanian.naraya...@gmail.com>
Subject Re: LEFT JOIN and WHERE CLAUSE - How to handle
Date Wed, 23 Aug 2017 15:21:33 GMT
Hi,

TXN.TXN_DT should be between CURRENCY.EFF_ST_DT and CURRENCY.EFF_END_DT. It
needs to be equated.


regards,
Rams

On Wed, Aug 23, 2017 at 7:55 PM, Furcy Pin <furcy.pin@flaminem.com> wrote:

> I would suggest to use a subquery
>
> WITH unique_currency AS (
>   SELECT
>     CCY_CD,
>     MAX(CNTRY_DESC) as CNTRY_DESC
>   FROM CURRENCY
>   GROUP BY CCY_CD
> )
>
> and then perform your left join on it.
>
> Some SQL engine (e.g. Presto) have aggregation functions like
> arbitrary(col) that take any value and are a little less costly than a
> max.
> Sometimes, they also have functions like max_by(x, y)
> <https://prestodb.io/docs/current/functions/aggregate.html#max_by> that
> would allow you to get the most recent description.
>
> It is a shame that this function is not included in Hive yet, but still
> you can find some UDAF implementations on github
> <https://github.com/dataiku/dataiku-hive-udf#first_of_group-last_of_group>
> .
>
>
>
> On Wed, Aug 23, 2017 at 3:37 PM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com> wrote:
>
>> Hi,
>>
>> Need your suggestion on the below.
>>
>> Have two tables TXN and CURRENCY.
>>
>> Need all records in TXN and hence doing Left Join with CURRENCY.
>>
>> *Two problems :*
>> 1. CURRENCY table may contain duplicate records hence it needs to be
>> handled through RANK or some other function.
>> 2. If we equate TXN_DT between EFF_ST_DT and EFF_END_DT in the 'where
>> clause' then we will loose the EUR records which should not happen.
>>
>> Please suggest a solution to over come both the problems. For duplicated
>> records it is fine if we select any of the CNTRY_DESC.
>>
>> *Table : CURRENCY*
>>
>> *Table : TXN*
>>
>> *CCY_CD*
>>
>> *CNTRY_DESC*
>>
>> *EFF_ST_DT*
>>
>> *EFF_END_DT*
>>
>> *ROW_NUM*
>>
>> *CCY_CD*
>>
>> *TXN_DT*
>>
>> INR
>>
>> Indian Rupee
>>
>> 1-Jan-15
>>
>> 20-Feb-16
>>
>> 1
>>
>> INR
>>
>> 16-Feb-17
>>
>> INR
>>
>> Indian Rupee New
>>
>> 21-Feb-16
>>
>> 20-Feb-99
>>
>> 2
>>
>> USD
>>
>> 16-Feb-17
>>
>> USD
>>
>> US Dollar
>>
>> 1-Jan-15
>>
>> 20-Feb-16
>>
>> 3
>>
>> SGD
>>
>> 16-Feb-17
>>
>> SGD
>>
>> Singapore Dollar
>>
>> 1-Jan-15
>>
>> 20-Feb-17
>>
>> 4
>>
>> EUR
>>
>> 16-Feb-17
>>
>> SGD
>>
>> Singapore Dollar New
>>
>> 15-Feb-17
>>
>> 20-Feb-99
>>
>> SGD
>>
>> Singapore Dollar Latest
>>
>> 16-Feb-17
>>
>> 16-Feb-17
>>
>> *Expected Output*
>>
>> *ROW_NUM*
>>
>> *CCY_CD*
>>
>> *TXN_DT*
>>
>> *CNTRY_DESC*
>>
>> 1
>>
>> INR
>>
>> 16-Feb-17
>>
>> Indian Rupee
>>
>> 2
>>
>> USD
>>
>> 16-Feb-17
>>
>> US Dollar
>>
>> 3
>>
>> SGD
>>
>> 16-Feb-17
>>
>> Singapore Dollar Latest (Any of three valid valid is fine)
>>
>> 4
>>
>> EUR
>>
>> 16-Feb-17
>>
>> <Null>
>>
>>
>>
>> *Query : *Select ROW_NUM,CCY_CD,TXN_DT,CNTRY_DESC
>> from CURRENCY LEFT JOIN TXN on (CURRENCY.CCY_CD = TXN.CCY_CD)
>> where
>> TXN_DT between EFF_ST_DT and EFF_END_DT;
>>
>>
>>
>> This query will drop the "EUR" record because of the where clause used.
>> It cannot be handled with case statement instead of 'where clause' as we
>> have   more than one record for 'SGD' when  TXN_DT is 16-FEB.
>>
>> regards,
>> Rams
>>
>
>

Mime
View raw message