hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanish Kushary <himan...@gmail.com>
Subject Re: Multiple Join giving different results compared to separate joins using intermediate table
Date Mon, 20 Aug 2012 15:22:38 GMT
Thanks Vinod.Why would the results vary ? Could you please give me some
pointers on why hive will treat them as different ?

On Sat, Aug 18, 2012 at 4:29 AM, Vinod Singh <vinod@vinodsingh.com> wrote:

> You may have to rewrite the query as-
>
> select <<above fields>> from
> (select <<above fields>> from A a join B b on (a.shopid=b.shopid and
> a.shopposition=b.shopposition)) D join C c(c.clientid=a.clientid)
>
> That will give results as per your expectation.
>
> Thanks,
> Vinod
>
>
> On Sat, Aug 18, 2012 at 8:20 AM, Himanish Kushary <himanish@gmail.com>wrote:
>
>> Hi,
>>
>> I am facing a weird issue.Probably I am missing something.Could somebody
>> please guide me.
>>
>> I have three tables in hive.
>>
>> Table A - clientid,shopid,shopposition,shopdate,shopitemid,shopitemdescr
>> (partitioned by date)
>> Table B- shopid,shopposition (distinct shopid,shopposition from another
>> table)
>> Table C - clientid,clientname,clientcity
>>
>> I would like to join the tables above to find the following information
>> for a date
>> - shopid,shopposition,shopdate,shopitemid,shopitemdescr,clientname,clientcity
>>
>> I wrote something like - select <<above fields>> from A a join B b on
>> (a.shopid=b.shopid and a.shopposition=b.shopposition) join C
>> c(c.clientid=a.clientid)
>>
>> The result is not what I expected and has lot of duplicates.
>>
>> Whereas if I create a seperate table from the first join - create table D
>> as select <<relevant fields>> from A a join B b on (a.shopid=b.shopid
and
>> a.shopposition=b.shopposition)
>>
>> and then join with C - select <<fields>> from D d join C c on (
>> d.clientid=c.clientid) the results are as expected.
>>
>> For example if Table A has 8 rows then following the second approach
>> gives me 8 rows with proper fields but the the former approach I get lot of
>> rows ( probably a cartesian product)
>>
>> I am confused about whats going wrong with the first approach,could
>> somebody throw some light please.
>>
>> ---------------------------
>> Thanks & Regards
>> Himanish
>>
>
>


-- 
Thanks & Regards
Himanish

Mime
View raw message