hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Can't use OR in left join
Date Thu, 26 Jul 2012 16:15:11 GMT
A join is implemented for most cases with a group by.

Rows in your table a and your table b will be grouped by something let's
say the value of your colum id.
So for each group doing a join is a trivial operation. The simple way is to
get all values, separate them somehow to know which are from the a table
and which are from the b table and them emit all couple (row_a,row_b) for
your value of "id".

But if you want to do a OR, there is no way to express it during the group
by. You must be able to define before the group by what will be the key of
it.

I am not saying that you can not solve your problem. Only that the OR
constraint is due to the MapReduce paradigm.

I hope it is clearer for you. Knowing what is map reduce could really help
you. It is does not mean you need to know java but you should understand
how the data is manipulated.

Bertrand

On Thu, Jul 26, 2012 at 5:34 PM, 周彩钦 <caiqinzhou@gmail.com> wrote:

> Thanks Bertrand,
> You said it's hadoop problem, is it means that if I change to use
> MapReduce (java MR or streaming), it still can't  achieve the purpose?
> PS: I'm not very familiar with java MR and streaming:)  but I have to find
> a way to implement it.
>
>
> On Thu, Jul 26, 2012 at 11:19 PM, Bertrand Dechoux <dechouxb@gmail.com>wrote:
>
>> That's a problem which is hadoop related and not really hive related.
>> The solution is to use only equal (as you know it). For that, you should
>> first extract your real identifier for a, which can be a.pid or a part of
>> it.
>> I assume that you can know it in advance which one will be used.
>>
>> Bertrand
>>
>>
>>
>> On Thu, Jul 26, 2012 at 5:11 PM, 周彩钦 <caiqinzhou@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have problem when using left join with hive 0.7.1.
>>> I have a query below:
>>>
>>> select
>>>   a.pid,
>>>   b.pid
>>> tab1 a
>>>   left join
>>> tab2 b
>>>   on (a.pid=b.pid or substr(a.pid,1,27)=b.pid);
>>>
>>> But hive don't support "OR" in left join.
>>> Table a is huge, and table b has 40000 rows now(will increase).
>>> Is there any other solution to achieve this?
>>>
>>> Thanks very much.
>>>
>>> --
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
>
> --
> /**********************************************************/
> // 姓名:周彩钦
> // 联系电话:15210364513
> // E-mail:caiqinzhou@gmail.com
> /**********************************************************/
>



-- 
Bertrand Dechoux

Mime
View raw message