hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Misha Dmitriev <mi...@cloudera.com>
Subject Re: Review Request 56687: Intern strings in various critical places to reduce memory consumption.
Date Fri, 24 Feb 2017 21:27:57 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56687/
-----------------------------------------------------------

(Updated Feb. 24, 2017, 9:27 p.m.)


Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena.


Changes
-------

Addressed Rui's and Mohit's comments.


Bugs: https://issues.apache.org/jira/browse/HIVE-15882
    https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882


Repository: hive-git


Description
-------

See the description of the problem in https://issues.apache.org/jira/browse/HIVE-15882 Interning
strings per this review removes most of the overhead due to duplicate strings.

Also, where maps in several places are created from other maps, use the original map's size
for the new map. This is to avoid the situation when a map with default capacity (typically
16) is created to hold just 2-3 entries, and the rest of the internal 16-entry array is wasted.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e81cbce3e333d44a4088c10491f399e92a505293

  ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 08420664d59f28f75872c25c9f8ee42577b23451

  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java e91064b9c75e8adb2b36f21ff19ec0c1539b03b9

  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 51530ac16c92cc75d501bfcb573557754ba0c964

  ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 55b3b551a1dac92583b6e03b10beb8172ca93d45

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 82dc89803be9cf9e0018720eeceb90ff450bfdc8

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java c0edde9e92314d86482b5c46178987e79fae57fe

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java c6ae6f290857cfd10f1023058ede99bf4a10f057

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 24d16812515bdfa90b4be7a295c0388fcdfe95ef

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java ede4fcbe342052ad86dadebcc49da2c0f515ea98

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java 0882ae2c6205b1636cbc92e76ef66bb70faadc76

  ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f

  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java d4bdd96eaf8d179bed43b8a8c3be0d338940154a

  ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java b7a7e4b7a5f8941b080c7805d224d3885885f444

  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 73981e826870139a42ad881103fdb0a2ef8433a2


Diff: https://reviews.apache.org/r/56687/diff/


Testing
-------

I've measured how much memory this change plus another one (interning Properties in PartitionDesc)
save in my HS2 benchmark - the result is 37%. See the details in HIVE-15882.


Thanks,

Misha Dmitriev


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message