hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
Date Mon, 07 Aug 2017 18:00:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116948#comment-16116948
] 

Sahil Takiar commented on HIVE-17237:
-------------------------------------

Why replace the use of {{Interners.newWeakInterner}} with {{String#intern()}}?

> HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-17237
>                 URL: https://issues.apache.org/jira/browse/HIVE-17237
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray (www.jxray.com)
It turns out that there are a lot of duplicate strings in memory, that waste 26.4% of the
heap. Most of them come from HashMaps referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters.
Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added code to
intern keys and values in the parameters table when it's first set up. However, when more
key-value pairs are added, they are not interned, and that probably explains the reason for
all these duplicate strings. Also when a Partition instance is deserialized, no interning
of parameters is currently done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  Overhead:
3,220,458K (26.4%)
> ....
> ===================================================
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 9583 of "9",
5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG
...[length 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 28 of "2",
21 of "0"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters
<--  {j.u.ArrayList} <-- org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
<-- Java Local (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
[@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of "174528",
1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 of "2",
3 of "0"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters
<--  {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 of "10"
... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of "6", 29 of
"2", 23 of "5", 19 of "9", 2 of "3"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters
<--  {j.u.ArrayList} <-- Java Local (j.u.ArrayList) [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>      <--  {j.u.HashMap}.keys <-- org.apache.hadoop.hive.metastore.api.Partition.parameters
<--  {j.u.ArrayList} <-- org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
<-- Java Local (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
[@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message