hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: How HIVE manages a join
Date Tue, 10 Aug 2010 21:11:37 GMT
In the Hive Join wiki page, it says
"THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"

Where should i do the update?

On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <heyongqiangict@gmail.com> wrote:
> Yeah. The sort merge bucket mapjoin has been finished for sometime,
> and seems stable now. I did one skew join but haven't get a chance to
> look at another skew join Namit mentioned to me. But definitely should
> update the wiki earlier. My bad.
>
> On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <hammer@cloudera.com> wrote:
>> Yongqiang mentioned he was going to update the wiki with this information in
>> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>
>> Yongqiang, have you gotten a chance to complete the sort merge bucket map
>> join and the other skew join you mention in the above thread?
>>
>> Thanks,
>> Jeff
>>
>> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> <bharat_v@students.iiit.ac.in> wrote:
>>>
>>> Roberto ..
>>>
>>> You can find these links useful ..
>>>
>>>
>>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> - Simple joins and optimizations..
>>>
>>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-teamĀ  -
>>> New kind of joins / features of hive ..
>>>
>>> Thanks
>>>
>>> Bharath.V
>>> 4th year Undergraduate..
>>> IIIT Hyderabad
>>>
>>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> <roberto.cappa@guest.telecomitalia.it> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> translate JOIN clauses to Map-Reduce tasks.
>>>>
>>>> In particular, if I have two tables A and B, each table is written on a
>>>> separate file and each file is splitted on hadoop nodes. When I perform a
>>>> JOIN with A.column = B.column, the framework has to compare full data from
>>>> the first file and full data from the second file. In order to perform a
>>>> full scan of all possibile combinations of values, how can hadoop perform
>>>> it? If each node contains a portion of each file, it seems not possible to
>>>> have a complete comparison. Does one of the two files enterely replicated
on
>>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>>
>>>> Thanks.
>>
>>
>

Mime
View raw message