hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Steinbach <c...@cloudera.com>
Subject Re: How HIVE manages a join
Date Tue, 10 Aug 2010 21:15:52 GMT
Hi Yongqiang,

Please go ahead and update the wiki page. I will copy it over to version
control when you are done.

Thanks.

Carl

On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <heyongqiangict@gmail.com>wrote:

> In the Hive Join wiki page, it says
> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>
> Where should i do the update?
>
> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <heyongqiangict@gmail.com>
> wrote:
> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
> > and seems stable now. I did one skew join but haven't get a chance to
> > look at another skew join Namit mentioned to me. But definitely should
> > update the wiki earlier. My bad.
> >
> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <hammer@cloudera.com>
> wrote:
> >> Yongqiang mentioned he was going to update the wiki with this
> information in
> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
> >>
> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
> map
> >> join and the other skew join you mention in the above thread?
> >>
> >> Thanks,
> >> Jeff
> >>
> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> >> <bharat_v@students.iiit.ac.in> wrote:
> >>>
> >>> Roberto ..
> >>>
> >>> You can find these links useful ..
> >>>
> >>>
> >>>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
> >>> - Simple joins and optimizations..
> >>>
> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team
> -
> >>> New kind of joins / features of hive ..
> >>>
> >>> Thanks
> >>>
> >>> Bharath.V
> >>> 4th year Undergraduate..
> >>> IIIT Hyderabad
> >>>
> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
> >>> <roberto.cappa@guest.telecomitalia.it> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I cannot find any documentation about what algorithm performs HIVE to
> >>>> translate JOIN clauses to Map-Reduce tasks.
> >>>>
> >>>> In particular, if I have two tables A and B, each table is written on
> a
> >>>> separate file and each file is splitted on hadoop nodes. When I
> perform a
> >>>> JOIN with A.column = B.column, the framework has to compare full data
> from
> >>>> the first file and full data from the second file. In order to perform
> a
> >>>> full scan of all possibile combinations of values, how can hadoop
> perform
> >>>> it? If each node contains a portion of each file, it seems not
> possible to
> >>>> have a complete comparison. Does one of the two files enterely
> replicated on
> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
> >>>>
> >>>> Thanks.
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message