hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "alex gemini (JIRA)" <>
Subject [jira] [Commented] (HIVE-3086) Skewed Join Optimization
Date Tue, 26 Jun 2012 09:05:44 GMT


alex gemini commented on HIVE-3086:

the design is very complicated IMO,what if we have a big table logs and a small table users,
table users have a column 'age', if we have issue a query skewed by age which we can't pre-partition
the big table.this design didn't handle it,right? I guess what we want is customer partition
at runtime,for the above example, we need customer partition(or some hint)or tell the query
plan we want to partition the users table at 'userid,age' column and also partition the logs
table at 'userid' column, the partition number for same userid for two table need to be same
for further join.
> Skewed Join Optimization
> ------------------------
>                 Key: HIVE-3086
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Nadeem Moidu
>            Assignee: Nadeem Moidu
> During a join operation, if one of the columns has a skewed key, it can cause that particular
reducer to become the bottleneck. The following feature will address it:

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message