hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-917) Bucketed Map Join
Date Thu, 04 Feb 2010 00:11:28 GMT


Namit Jain commented on HIVE-917:

Started looking at the patch:

1. Can you change the test to add a explain for all queries ?
2. Can you change the default value of hive.optimize.bucketmapjoin to true ?
3. Instead of changing, can you create the tables/partitions you need in the
new test only ?

> Bucketed Map Join
> -----------------
>                 Key: HIVE-917
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>         Attachments: hive-917-2010-2-3.patch
> Hive already have support for map-join. Map-join treats the big table as job input, and
in each mapper, it loads all data from a small table.
> In case the big table is already bucketed on the join key, we don't have to load the
whole small table in each of the mappers. This will greatly alleviate the memory pressure,
and make map-join work with medium-sized tables.
> There are 4 steps we can improve:
> S0. This is what the user can already do now: create a new bucketed table and insert
all data from the small table to it; Submit BUCKETNUM jobs, each doing a map-side join of
> S1. Change the code so that when map-join is loading the small table, we automatically
drop the rows with the keys that are NOT in the same bucket as the big table. This should
alleviate the problem on memory, but we might still have thousands of mappers reading the
whole of the small table.
> S2. Let's say the user already bucketed the small table on the join key into exactly
the same number of buckets (or a factor of the buckets of the big table), then map-join can
choose to load only the buckets that are useful.
> S3. Add a new hint (e.g. /*+ MAPBUCKETJOIN(a) */), so that Hive automatically does S2,
without the need of asking the user to create temporary bucketed table for the small table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message