hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/JoinOptimization" by Kirk True
Date Mon, 27 Dec 2010 20:13:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/JoinOptimization" page has been changed by Kirk True.
The comment on this change is: Fixed what appears to be a copy-and-paste error with the Jira
links.
http://wiki.apache.org/hadoop/Hive/JoinOptimization?action=diff&rev1=12&rev2=13

--------------------------------------------------

  
  In Fig 1 above, the previous map join implementation does not scale well when the larger
table is huge because each Mapper will directly read the small table data from HDFS. If the
larger table is huge, there will be thousands of Mapper launched to read different records
of the larger table. And those thousands of Mappers will read this small table data from HDFS
into their memory, which can make access to the small table become the performance bottleneck;
or, sometimes Mappers will get lots of time-outs for reading this small file, which may cause
the task to fail.
  
- Hive-1641 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]])
has solved this problem, as shown in Fig2 below.
+ Hive-1641 ([[http://issues.apache.org/jira/browse/HIVE-1641|http://issues.apache.org/jira/browse/HIVE-1641]])
has solved this problem, as shown in Fig2 below.
  
  {{attachment:fig2.jpg||height="881px",width="1184px"}}
  
@@ -23, +23 @@

  Obviously, the Local Task is a very memory intensive. So the query processor will launch
this task in a child jvm, which has the same heap size as the Mapper's. Since the Local Task
may run out of memory, the query processor will measure the memory usage of the local task
very carefully. Once the memory usage of the Local Task is higher than a threshold number.
This Local Task will abort itself and tells the user that this table is too large to hold
in the memory. User can change this threshold by '''''set hive.mapjoin.localtask.max.memory.usage
= 0.999;'''''
  
  == 1.2 Removing JDBM ==
- Previously, Hive uses JDBM ([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]])
as a persistent hashtable. Whenever the in-memory hashtable cannot hold data any more, it
will swap the key/value into the JDBM table. However when profiling the Map Join, we found
out this JDBM component takes more than 70 % CPU time as shown in Fig3. Also the persistent
file JDBM generated is too large to put into the Distributed Cache. For example, if users
put 67,000 simple integer key/value pairs into the JDBM, it will generate more 22M hashtable
file. So the JDBM is too heavy weight for Map Join and it would better to remove this component
from Hive. Map Join is designed for holding the small table's data into memory. If the table
is too large to hold, just run as a Common Join. There is no need to use a persistent hashtable
any more. Hive-1754 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1754]])
+ Previously, Hive uses JDBM ([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]])
as a persistent hashtable. Whenever the in-memory hashtable cannot hold data any more, it
will swap the key/value into the JDBM table. However when profiling the Map Join, we found
out this JDBM component takes more than 70 % CPU time as shown in Fig3. Also the persistent
file JDBM generated is too large to put into the Distributed Cache. For example, if users
put 67,000 simple integer key/value pairs into the JDBM, it will generate more 22M hashtable
file. So the JDBM is too heavy weight for Map Join and it would better to remove this component
from Hive. Map Join is designed for holding the small table's data into memory. If the table
is too large to hold, just run as a Common Join. There is no need to use a persistent hashtable
any more. Hive-1754 ([[http://issues.apache.org/jira/browse/HIVE-1754|http://issues.apache.org/jira/browse/HIVE-1754]])
  
  {{attachment:fig3.jpg}}
  
@@ -42, +42 @@

  == 2.1 New Join Execution Flow ==
  Since map join is faster than the common join, it would be better to run the map join whenever
possible. Previously, Hive users need to give a hint in the query to assign which table the
small table is. For example, '''''select /*+mapjoin(a)*/ * from src1 x  join src2y on x.key=y.key''''';
  It is not a good way for user experience and query performance, because sometimes user may
give a wrong hint and also users may not give any hints. It would be much better to convert
the Common Join into Map Join without users' hint.
  
- Hive-1642 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1642]])
has solved the problem by converting the Common Join into Map Join automatically. For the
Map Join, the query processor should know which input table the big table is. The other input
tables will be recognize as the small tables during the execution stage and these tables need
to be held in the memory. However, in general, the query processor has no idea of input file
size during compiling time (even with statistics) because some of the table may be intermediate
tables generated from sub queries. So the query processor can only figure out the input file
size during the execution time.
+ Hive-1642 ([[http://issues.apache.org/jira/browse/HIVE-1642|http://issues.apache.org/jira/browse/HIVE-1642]])
has solved the problem by converting the Common Join into Map Join automatically. For the
Map Join, the query processor should know which input table the big table is. The other input
tables will be recognize as the small tables during the execution stage and these tables need
to be held in the memory. However, in general, the query processor has no idea of input file
size during compiling time (even with statistics) because some of the table may be intermediate
tables generated from sub queries. So the query processor can only figure out the input file
size during the execution time.
  
  Right now, users need to enable this feature by''''' set hive.auto.convert.join = true;'''''
  

Mime
View raw message