hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13755) Hybrid mapjoin allocates memory the same for multi broadcast
Date Fri, 13 May 2016 05:52:13 GMT
Wei Zheng created HIVE-13755:
--------------------------------

             Summary: Hybrid mapjoin allocates memory the same for multi broadcast
                 Key: HIVE-13755
                 URL: https://issues.apache.org/jira/browse/HIVE-13755
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.1.0
            Reporter: Wei Zheng
            Assignee: Wei Zheng


PROBLEM:

When hybrid mapjoin gets the memory needed, it estimates memory needed for each hashtable
the same. This may cause problem when there are multiple broadcast, as it may exceeds the
memory intended to allocate to it.

Example reducer task log attached.  This task has 5 broadcast input,

Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE),
Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 (SIMPLE_EDGE)



excerpt of it:

{code}
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates
0 bytes for the loading hashtable.
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count
from statistics is 210; setting map size to 280
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Total available memory: 1968177152
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Estimated small table size: 155190
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of hash partitions to be created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Write buffer size: 524288
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions spilled directly to disk on creation: 0
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer
HybridHashTableContainer
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe
and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records
read: 20
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG method=LoadHashtable
start=1458069830811 end=1458069830814 duration=3 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching key: svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: Initializing operator
HASHTABLEDUMMY[32]
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing operator MAPJOIN[26]
2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string>
totalsz = 95
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG method=LoadHashtable
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates
0 bytes for the loading hashtable.
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count
from statistics is 5942112; setting map size to 7922816
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Total available memory: 1968177152
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Estimated small table size: 1324101915
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of hash partitions to be created: 16
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Write buffer size: 8388608
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions created: 16
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions spilled directly to disk on creation: 0
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer
HybridHashTableContainer
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe
and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records
read: 852596
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG method=LoadHashtable
start=1458069830817 end=1458069831563 duration=746 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching key: svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container
2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|: Initializing operator
HASHTABLEDUMMY[31]
2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing operator MAPJOIN[27]
2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string>
totalsz = 93
2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG method=LoadHashtable
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates
0 bytes for the loading hashtable.
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count
from statistics is 293380; setting map size to 391174
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Total available memory: 1968177152
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Estimated small table size: 69929471
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of hash partitions to be created: 16
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Write buffer size: 4194304
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions created: 16
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Number of partitions spilled directly to disk on creation: 0
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer
HybridHashTableContainer
2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|:
Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe
and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records
read: 586760


{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message