hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/JoinOptimization" by LiyinTang
Date Tue, 30 Nov 2010 21:49:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/JoinOptimization" page has been changed by LiyinTang.
http://wiki.apache.org/hadoop/Hive/JoinOptimization?action=diff&rev1=3&rev2=4

--------------------------------------------------

  == 1.1 Using Distributed Cache to Propagate Hashtable File ==
  Previously, when 2 large data tables need to do a join, there will be 2 different Mappers
to sort these tables based on the join key and emit an intermediate file, and the Reducer
will take the intermediate file as input file and do the real join work. This join mechanism
is perfect with two large data size. But if one the join table is small enough to fit into
the Mapper’s memory, then there is no need to launch the Reducer. Actually, the Reducer
stage is very expensive for the performance because the Map/Reduce framework needs to sort
and merge the intermediate files.
  
- {{None}}
+ {{attachment:fig1.jpg||height="764px",width="911px"}}
  
- So the basic idea of map join is to hold the data of small table in Mapper’s memory and
do the join work in Map stage, which saves the Reduce stage. As shown as Fig 1, the previous
map join operation is not scale for large data because each Mapper will directly read the
small table data from HDFS. If the large data file is large enough, there will be thousands
of Mapper launched to read different record of this large data file. And thousands of Mapper
will read this small table data from HDFS into the memory, which makes the small table to
be the performance bottleneck, or sometimes Mapper will get lots of time-out for reading this
small file, which may cause the task failed.
+ '''Fig 1. The Previous Map Join'''
  
- Hive-1641 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]])
+ So the basic idea of map join is to hold the data of small table in Mapper’s memory and
do the join work in Map stage, which saves the Reduce stage. As shown in Fig 1, the previous
map join operation is not scale for large data because each Mapper will directly read the
small table data from HDFS. If the large data file is large enough, there will be thousands
of Mapper launched to read different record of this large data file. And thousands of Mapper
will read this small table data from HDFS into the memory, which makes the small table to
be the performance bottleneck, or sometimes Mapper will get lots of time-out for reading this
small file, which may cause the task failed.
+ 
+ {{attachment:fig2.jpg}}
+ 
+ Hive-1641 ([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]])
has solved this problem. As shown in Fig2, the basic idea is to create a new task, MapReduce
Local Task, before the orginal Join Map/Reduce Task. This new task will read the small table
data from HDFS to in-memory hashtable. After reading, it will serialize the in-memory hashtable
into files on disk and compress the hashtable file into a tar file. In next stage, when the
MapReduce task is launching, it will put this tar file to Hadoop Distributed Cache, which
will populate the tar file to each Mapper’s local disk and decompress the file. So all the
Mappers can deserialize the hashtable file back into memory and do the join work as before.
  
  == 1.2 Removing JDBM ==
+ When profiing the Map Join,
+ 
  == 1.3 Performance Evaluation ==
  = 2. Converting Join into Map Join dyanmically =
  == 2.1 Join Exeuction Flow ==

Mime
View raw message