hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: How HIVE manages a join
Date Tue, 10 Aug 2010 21:57:03 GMT
This page is is already in version control..

/home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml

Edward

On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <carl@cloudera.com> wrote:
> Hi Yongqiang,
> Please go ahead and update the wiki page. I will copy it over to version
> control when you are done.
> Thanks.
> Carl
>
> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <heyongqiangict@gmail.com>
> wrote:
>>
>> In the Hive Join wiki page, it says
>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>
>> Where should i do the update?
>>
>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <heyongqiangict@gmail.com>
>> wrote:
>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> > and seems stable now. I did one skew join but haven't get a chance to
>> > look at another skew join Namit mentioned to me. But definitely should
>> > update the wiki earlier. My bad.
>> >
>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <hammer@cloudera.com>
>> > wrote:
>> >> Yongqiang mentioned he was going to update the wiki with this
>> >> information in
>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>
>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>> >> map
>> >> join and the other skew join you mention in the above thread?
>> >>
>> >> Thanks,
>> >> Jeff
>> >>
>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >> <bharat_v@students.iiit.ac.in> wrote:
>> >>>
>> >>> Roberto ..
>> >>>
>> >>> You can find these links useful ..
>> >>>
>> >>>
>> >>>
>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>> - Simple joins and optimizations..
>> >>>
>> >>>
>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-teamĀ 
-
>> >>> New kind of joins / features of hive ..
>> >>>
>> >>> Thanks
>> >>>
>> >>> Bharath.V
>> >>> 4th year Undergraduate..
>> >>> IIIT Hyderabad
>> >>>
>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>> <roberto.cappa@guest.telecomitalia.it> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I cannot find any documentation about what algorithm performs HIVE
to
>> >>>> translate JOIN clauses to Map-Reduce tasks.
>> >>>>
>> >>>> In particular, if I have two tables A and B, each table is written
on
>> >>>> a
>> >>>> separate file and each file is splitted on hadoop nodes. When I
>> >>>> perform a
>> >>>> JOIN with A.column = B.column, the framework has to compare full
data
>> >>>> from
>> >>>> the first file and full data from the second file. In order to
>> >>>> perform a
>> >>>> full scan of all possibile combinations of values, how can hadoop
>> >>>> perform
>> >>>> it? If each node contains a portion of each file, it seems not
>> >>>> possible to
>> >>>> have a complete comparison. Does one of the two files enterely
>> >>>> replicated on
>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>> >>>>
>> >>>> Thanks.
>> >>
>> >>
>> >
>
>

Mime
View raw message