hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings
Date Thu, 16 Mar 2017 14:51:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928202#comment-15928202
] 

Sergio Peña commented on HIVE-16166:
------------------------------------

[~misha@cloudera.com] HiveQA already finished, but it could not post the results on the JIRA.

Here are the results:

https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/4164/

Test Result (10 failures / +10)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_varchar]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_join_part_col_char]

> HS2 may still waste up to 15% of memory on duplicate strings
> ------------------------------------------------------------
>
>                 Key: HIVE-16166
>                 URL: https://issues.apache.org/jira/browse/HIVE-16166
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted on duplicate
strings, despite the recent optimizations that I made. The problematic strings just come from
different sources this time. See the excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead of duplicate
strings with this workload to ~6%. The remaining duplicates come mostly from JDK internal
and MapReduce data structures, and thus are more difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message