hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row
Date Sat, 13 Nov 2010 01:16:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931599#action_12931599
] 

Liyin Tang commented on HIVE-1642:
----------------------------------

I just finished converting common join into map join based on the file size.  There are 2
flags to control this optimization.
1)	set hive.auto.convert.join = true; It means this optimization is enabled. By default right
now, this flag is disabled in order not to break any existing test cases. Also I put 25 additional
test cases, auto_join0.q - auto_join25.q, which covers this optimization code.
2)	Set hive.hashtable.max.memory.usage = 0.9;  It means if the memory usage of local task
is more than 90% of its heap size, then the local task will abort by itself. The Driver will
know the local work fails and it won't submit the MapJoinTask (a Map Only MapRedTask)  to
Hadoop, but instead, it will submit the originally CommonJoinTask to Hadoop to run.
3)	Set hive.smalltable.filesize = 25000000L;  It means if the summary of the small table file
size is less than 25M, then it will run the map join task. If not, just run the originally
common join task.
 The following is the basic flow how it works. For each common join, create a conditional
task.
1)	For each join table, generate a mapjoin task by assuming this table is big table. 
a.	The left side of right outer join must be small table.
b.	The right side of left outer join must be small table.
c.	No full outer join can be optimized. 
d.	Eg. A left outer join B right outer join C. Only C can be big table table.
e.	Eg. A right outer join B left outer join C. Only B can be big table table.
f.	Eg. A left outer join B left outer join C. Only A can be big table table.
g.	Eg. A right outer join B right outer join C. Both B and C can be big table table.
2)	Put all these generated map join tasks into conditional task and set the mapping between
big table's alias with the corresponding map join task.
3)	During the execution time, the resolver will read the input file size. If the input file
size of small table is less than a threshold, than run the converted map join task. 
4)	Set each map join task with a backup task. The backup task is the originally common join
task.
This mapping relationship is set during execution time.
5)	If the map join task return abnormally, launch the backup task.



> Convert join queries to map-join based on size of table/row
> -----------------------------------------------------------
>
>                 Key: HIVE-1642
>                 URL: https://issues.apache.org/jira/browse/HIVE-1642
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically be able
to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message