hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject RE: HIVE : LEFT OUTER JOIN + NON-EQUI JOIN
Date Sat, 09 Jan 2016 13:12:45 GMT
Can you do two paths?

 

create temporary table tmp AS

SELECT zz.report_dt, vss.eff_dt, vss.disc_dt, … rest of columns

FROM rvsed11 zz 
LEFT OUTER JOIN rvsed22 vss 
  ON zz.company_id = vss.company_id 
  AND zz.shares_ship_id = vss.shares_ship_id 

;

select <needed columns> from #tmp

Where 

report_dt >= eff_dt  AND report_dt < disc_dt;

 

that should return correct results

 

HTH,

 

Mich



 

From: @Sanjiv Singh [mailto:sanjiv.is.on@gmail.com] 
Sent: 09 January 2016 12:19
To: user@hive.apache.org; dev@hive.apache.org
Subject: HIVE : LEFT OUTER JOIN + NON-EQUI JOIN

 

Hi All,



I am having issue hive LEFT OUTER JOIN.


I had al table in sql-server. then used sqoop to migrate all tables on hive.  

This is the original query from sql-server which contains non-equi  LEFT OUTER  JOIN.  both
table have cartesian data.


SELECT 
  vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat,  vss.seat_loc_dscr,
vss.ep_seat AS EPlus_Seat, vss.ep_win_seat, vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,

  vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat,  vss.y_mid_seat, vss.y_asle_seat,  vss.fj_win_seat,
vss.fj_mid_seat,  vss.fj_asle_seat,vss.exit_row,  vss.bulkhead_row, vss.eff_dt, vss.disc_dt

FROM rvsed11 zz 
LEFT OUTER JOIN rvsed22 vss 
  ON zz.company_id = vss.company_id 
  AND zz.shares_ship_id = vss.shares_ship_id 
  AND zz.report_dt >= vss.eff_dt 
  AND zz.report_dt < vss.disc_dt;

As we know that Nonequi joins are not working in hive ( Nonequi joins working in WHERE clause
but we cannot use with LEFT OUTER JOIN).

See below hive query with noon-equi condition moved to where clause.


SELECT 
  vss.company_id,vss.shares_ship_id,vss.seatmap_cd,vss.cabin,vss.seat,  vss.seat_loc_dscr,
vss.ep_seat AS EPlus_Seat, vss.ep_win_seat, vss.ep_asle_seat, vss.ep_mid_seat, vss.em_win_seat,

  vss.em_mid_seat,vss.em_asle_seat,vss.y_win_seat,  vss.y_mid_seat, vss.y_asle_seat,  vss.fj_win_seat,
vss.fj_mid_seat,  vss.fj_asle_seat,vss.exit_row,  vss.bulkhead_row, vss.eff_dt, vss.disc_dt

FROM rvsed11 zz 
LEFT OUTER JOIN rvsed22 vss 
  ON zz.company_id = vss.company_id 
  AND zz.shares_ship_id = vss.shares_ship_id 
WHERE zz.report_dt >= vss.eff_dt 
  AND zz.report_dt < vss.disc_dt;



Original query is giving 1162 records on Sql-Server , but  this hive query giving 46240 records.

I tried multiple workaround to get same logic , but didn't get same result on hive. 

Can you please help me on this to identify this issue and get query working on hive with same
result set. 

Let me know you need other details.




Regards
Sanjiv Singh
Mob :  +091 9990-447-339


Mime
View raw message