hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周彩钦 <caiqinz...@gmail.com>
Subject Re: Can't use OR in left join
Date Thu, 26 Jul 2012 16:39:16 GMT
Hi Bertrand,
Thanks for your quick reply,  got it now.

Thanks.

On Fri, Jul 27, 2012 at 12:15 AM, Bertrand Dechoux <dechouxb@gmail.com>wrote:

> A join is implemented for most cases with a group by.
>
> Rows in your table a and your table b will be grouped by something let's
> say the value of your colum id.
> So for each group doing a join is a trivial operation. The simple way is
> to get all values, separate them somehow to know which are from the a table
> and which are from the b table and them emit all couple (row_a,row_b) for
> your value of "id".
>
> But if you want to do a OR, there is no way to express it during the group
> by. You must be able to define before the group by what will be the key of
> it.
>
> I am not saying that you can not solve your problem. Only that the OR
> constraint is due to the MapReduce paradigm.
>
> I hope it is clearer for you. Knowing what is map reduce could really help
> you. It is does not mean you need to know java but you should understand
> how the data is manipulated.
>
> Bertrand
>
>
> On Thu, Jul 26, 2012 at 5:34 PM, 周彩钦 <caiqinzhou@gmail.com> wrote:
>
>> Thanks Bertrand,
>> You said it's hadoop problem, is it means that if I change to use
>> MapReduce (java MR or streaming), it still can't  achieve the purpose?
>> PS: I'm not very familiar with java MR and streaming:)  but I have to
>> find a way to implement it.
>>
>>
>> On Thu, Jul 26, 2012 at 11:19 PM, Bertrand Dechoux <dechouxb@gmail.com>wrote:
>>
>>> That's a problem which is hadoop related and not really hive related.
>>> The solution is to use only equal (as you know it). For that, you should
>>> first extract your real identifier for a, which can be a.pid or a part of
>>> it.
>>> I assume that you can know it in advance which one will be used.
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Thu, Jul 26, 2012 at 5:11 PM, 周彩钦 <caiqinzhou@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have problem when using left join with hive 0.7.1.
>>>> I have a query below:
>>>>
>>>> select
>>>>   a.pid,
>>>>   b.pid
>>>> tab1 a
>>>>   left join
>>>> tab2 b
>>>>   on (a.pid=b.pid or substr(a.pid,1,27)=b.pid);
>>>>
>>>> But hive don't support "OR" in left join.
>>>> Table a is huge, and table b has 40000 rows now(will increase).
>>>> Is there any other solution to achieve this?
>>>>
>>>> Thanks very much.
>>>>
>>>> --
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>>
>> --
>> /**********************************************************/
>> // 姓名:周彩钦
>> // 联系电话:15210364513
>> // E-mail:caiqinzhou@gmail.com
>> /**********************************************************/
>>
>
>
>
> --
> Bertrand Dechoux
>



-- 
/**********************************************************/
// 姓名:周彩钦
// 联系电话:15210364513
// E-mail:caiqinzhou@gmail.com
/**********************************************************/

Mime
View raw message