hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <>
Subject Re: join in hive
Date Mon, 26 Oct 2009 04:04:26 GMT
Mostly correct.

2. Your idea looks interesting but I would say in reality, the percentage of
tuples purged may not be that large.
4. Hive does NOT treat the partition column differently than others.
5. There is no sort-merge join yet. This would be a great feature to add
onto Hive!


2009/10/25 Gang Luo <>

> Hi everyone,
> I am going to do some interesting things for join in hive. Before I read
> the source code, could anyone tell me what kinds of join have been
> implemented in the newest version of hive?
> Right now, what I have known are:
> 1. symmetric join has been implemented, which is the default join.
> 2. asymmetric join, a.k.s. map-side join (for joining a huge table and a
> small table and only use the map phase), has been implemented. But no
> optimization was added. If so, what I think is when we meet two huge tables,
> we can use semi-join to first get rid of the non-referenced tuples in one
> tables making it smaller, and then do the map-side join.
> 3. 3-way join (only use one map-reduce job to join 3 tables) was
> implemented, but only applied for joining on the same join key (A.k=B.k &&
> B.k=C.k). If we want to join 3 tables on different join keys (A.k1=B.k1 &
> B.k2=C.k2), we still need 2 map-reduce jobs.
> 4. when joining two tables, hive could tell whether the join key is a
> partitioned column, and make good use of this partition feature.
> 5. no sort-merge join was implemented in hive right now, thus we cannot do
> the in-equi join.
> There may be many mistakes in my understanding. Please point it out or give
> me further information about join in hive. Thanks so much.
> Luo, Gang
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> ------------------------------
> 好玩贺卡等你发,邮箱贺卡全新上线!<*>


View raw message