Thanks Mark for writing back to me.
I did the same thing and still not getting the desired result.
I was looking for the cause. I managed to get the desired result. Sharing
my thought process and how I did it. Let me know your thoughts on this,
that would be really nice to have. Also please share any better idea to
achieve this result and feel free to point out if you think that my
explanation is wrong.
I think you noticed the fact that I want the value column for both the ID_1
and ID_2 and there values in these columns are not UNIQUE.
It happened that I used to get more rows returned than the original no of
rows in the second table due to the fact that there are duplicates in ID_1
and ID_2 column.
So first of all to get the Value column for both ID_1 and ID_2 I have to
join twice and that too I can not have join on condition which have
duplicates.
So I created a new table out of table 2 with addition of a column having
unique value for each row using HIVE UDF.
Then I created 2 views each for ID_1 and ID_2 which contains the
description by joining on the condition a.ID_1 = b.ID.
Now I have two views and then I joined these two views on the condition
that view1.uniqueid = view2.uniqueid.
That gives me desired output.
Looking forward to have your views.
Thanks and regards,
Souvik.
On Fri, Dec 28, 2012 at 7:40 AM, Mark Grover <grover.markgrover@gmail.com>wrote:
> Souvik,
> In your new example, you need a RIGHT OUTER JOIN between table1 and table2
> (order matters  table1 on left, table2 on right) on the ID1 column.
> Something like this (untested by me):
> SELECT
> table1.*,
> table2.*
> FROM
> table1
> RIGHT OUTER JOIN table2
> ON (table1.id=table2.id_1);
>
> Mark
>
>
> On Thu, Dec 27, 2012 at 9:26 PM, Souvik Banerjee <souvikbanerjee@gmail.com
> > wrote:
>
>> Thanks a lot Mark for your attention.
>> But I think I cannot go for INNER join, the reason behind the fact being
>> that I want all rows of Table3 irrespective of there is any row
>> corresponding to that ID in Table 1 or Table 2.
>> Probably I would have taken care of that while providing the example.
>> The more refined example would be like below.
>>
>> Look forward for your help.
>>
>> Thanks and regards,
>> Souvik.
>>
>> P.S. Dropped one table from the earlier example and Now we have table 1
>> and table 2, we are looking for table 4. (Hope so it's more simple and it's
>> exactly we need)
>>
>>
>>
>> *Table 1*
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V11
>>
>> 2
>>
>> V12
>>
>> 3
>>
>> V13
>>
>>
>>
>> *Table 2*
>>
>> ID_1
>>
>> ID_2
>>
>> Other_Column1
>>
>> 1
>>
>> 3
>>
>> C1
>>
>> 2
>>
>> 1
>>
>> C2
>>
>> 3
>>
>> 2
>>
>> C3
>>
>> 4
>>
>> 9
>>
>> C4
>>
>> 12
>>
>> 3
>>
>> C5
>>
>> 1
>>
>> 8
>>
>> C6
>>
>>
>>
>> I want to formulate a table which would look like (*Table 4*)
>>
>> ID_1
>>
>> ID_1_Value
>>
>> ID_2
>>
>> ID_2_value
>>
>> Other_Column1
>>
>> 1
>>
>> V11
>>
>> 3
>>
>> V13
>>
>> C1
>>
>> 2
>>
>> V12
>>
>> 1
>>
>> V11
>>
>> C2
>>
>> 3
>>
>> V13
>>
>> 2
>>
>> V12
>>
>> C3
>>
>> 4
>>
>> NULL / Empty
>>
>> 9
>>
>> NULL / Empty
>>
>> C4
>>
>> 12
>>
>> NULL / Empty
>>
>> 3
>>
>> V13
>>
>> C5
>>
>> 1
>>
>> V11
>>
>> 8
>>
>> NULL / Empty
>>
>> C6
>>
>>
>> On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Souvik,
>>> Let me begin by saying that simplifying the problem goes a long way in
>>> helping us answer your question. You did it really nicely here, so thank
>>> you for doing that.
>>>
>>> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
>>> the same query as INNER JOIN with no restrictions on what can be included
>>> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
>>> efficiently. If you want columns from the right table in your select list,
>>> just do the regular (aka inefficient way) inner join.
>>>
>>> Of course, you can optimize inner joins as map joins, sorted merge join
>>> or sorted merge bucketed joins depending on your use case.
>>>
>>> Mark
>>>
>>> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <
>>> souvikbanerjee@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am struggling with a problem described below.
>>>> Any help how to resolve this problem is highly appreciated.
>>>>
>>>> I have got few tables the structure is over simplified for the sake of
>>>> describing the nature of the problem that I am facing.
>>>>
>>>> *Table 1*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID
>>>>
>>>> Value
>>>>
>>>> 1
>>>>
>>>> V11
>>>>
>>>> 2
>>>>
>>>> V12
>>>>
>>>> 3
>>>>
>>>> V13
>>>>
>>>>
>>>>
>>>> *Table 2*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID
>>>>
>>>> Value
>>>>
>>>> 1
>>>>
>>>> V21
>>>>
>>>> 2
>>>>
>>>> V22
>>>>
>>>> 3
>>>>
>>>> V23
>>>>
>>>>
>>>>
>>>> *Table 3*
>>>> * *
>>>>
>>>>
>>>>
>>>> ID_1
>>>>
>>>> ID_2
>>>>
>>>> Other_Column1
>>>>
>>>> 1
>>>>
>>>> 3
>>>>
>>>> C1
>>>>
>>>> 2
>>>>
>>>> 1
>>>>
>>>> C2
>>>>
>>>> 3
>>>>
>>>> 2
>>>>
>>>> C3
>>>>
>>>>
>>>>
>>>> I want to formulate a table which would look like
>>>>
>>>> ID_1
>>>>
>>>> ID_1_Value
>>>>
>>>> ID_2
>>>>
>>>> ID_2_value
>>>>
>>>> Other_Column1
>>>>
>>>> 1
>>>>
>>>> V11
>>>>
>>>> 3
>>>>
>>>> V23
>>>>
>>>> C1
>>>>
>>>> 2
>>>>
>>>> V12
>>>>
>>>> 1
>>>>
>>>> V21
>>>>
>>>> C2
>>>>
>>>> 3
>>>>
>>>> V13
>>>>
>>>> 2
>>>>
>>>> V22
>>>>
>>>> C3
>>>>
>>>>
>>>>
>>>> I am facing problem with this.
>>>> I tried to LEFT SEMI JOIN in Hive.
>>>>
>>>> I tried to do it in two steps (For 3 tables)
>>>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>>>> But the problem is that in LEFT SEMI JOIN you can not have columns from
>>>> the right table in the select clause. So after join my new table simply
>>>> looks like TABLE 3.
>>>>
>>>> Can you help me how I can achieve this is HIVE.
>>>>
>>>> Thanks and regards,
>>>> Souvik.
>>>>
>>>
>>>
>>
>
