hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat" <>
Subject MapSide join in Hive
Date Thu, 24 Jun 2010 17:43:16 GMT
Hi all,

 I am joining 2 datasets, one is around 1.5TB in size and the other is
around 350MB in size.

I wanted to do a Map Side join using "id" as the join column between the
two tables. I read about the Mapside join in Hive. Are there some
technical specs on Mapside join on a wiki/jira?

Here are some questions:

1)       Do the tables need to be sorted on "id"?

2)       Is there a restriction on the smaller table size?

Are there other join optimizations that Hive provides which I can apply



View raw message