hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.prasant...@gmail.com
Subject Re: Review Request 58777: HIVE-16546: LLAP: Fail map join tasks if hash table memory exceeds threshold
Date Thu, 27 Apr 2017 19:45:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58777/#review173243
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java
Lines 148 (patched)
<https://reviews.apache.org/r/58777/#comment246289>

    Key and Value for the non-optimized hash table loader is Object[] which will hold serialized
binary objects or deserialized object corresponding to column values. It is very intrusive
to add memory estimation for all types, OIs, writable etc. so the assumption here is that
each entry in the hash table is of size 1KB. In most cases, we use optimized hash table which
is pretty much flat and can provide better in-memory estimates. Best way to find deep object
size is to iterate all declared fields and used instrumentation object size to find the actual
size but it needs a separate agent combined with reflection :)



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java
Lines 228 (patched)
<https://reviews.apache.org/r/58777/#comment246291>

    Good catch. My bad. Both should not be multiplied by inflation factor. Only no conditional
task size has to be multiplied by inflation factor. 
    
    Regd. compressed tables. It actually depends. ORC for example, even if the table is compressed
the raw data size returned by ORC reader represents uncompressed data size. Metastore stores
file size (compressed) and raw data size. Statistics annotation will use raw data size when
available else hive.stats.deserialization.factor can be set to account for inflation.


- Prasanth_J


On April 27, 2017, 8:43 a.m., Prasanth_J wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58777/
> -----------------------------------------------------------
> 
> (Updated April 27, 2017, 8:43 a.m.)
> 
> 
> Review request for hive, Gunther Hagleitner, Sergey Shelukhin, and Siddharth Seth.
> 
> 
> Bugs: HIVE-16546
>     https://issues.apache.org/jira/browse/HIVE-16546
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16546: LLAP: Fail map join tasks if hash table memory exceeds threshold
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/MemoryEstimate.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3ea824 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
04e24bd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java a3bccc6

>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
04e89e8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
c86e5f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 6d71fef

>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 7b13e90 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ObjectCache.java 72dcdd3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashMap.java
6242daf 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashMultiSet.java
1a41961 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashSet.java
331867c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashTable.java
b93e977 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTable.java
b6db3bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java
49ecdd1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastKeyStore.java
be51693 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMap.java
6fe98f9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMultiSet.java
9140aee 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashSet.java
d3efb11 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashTable.java
8bfa07c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastMultiKeyHashMap.java
add4788 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastMultiKeyHashMultiSet.java
faefdbb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastMultiKeyHashSet.java
5328910 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMap.java
f13034f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMultiSet.java
53ad7b4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashSet.java
723c729 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
05f1cf1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastValueStore.java
f9c5b34 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/hashtable/VectorMapJoinHashTable.java
c7e585c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedHashSet.java
93a89d7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedHashTable.java
5fe7861 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashSet.java
f921b9c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java ad77e87 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java b2893e7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
d375d1b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
93b8a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenSparkSkewJoinProcessor.java
405c3ca 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d39b8bd 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 032c7bb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 940630c 
>   serde/src/java/org/apache/hadoop/hive/serde2/WriteBuffers.java a4ecd9f 
> 
> 
> Diff: https://reviews.apache.org/r/58777/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message