hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings
Date Fri, 10 Mar 2017 01:06:37 GMT
Misha Dmitriev created HIVE-16166:
-------------------------------------

             Summary: HS2 may still waste up to 15% of memory on duplicate strings
                 Key: HIVE-16166
                 URL: https://issues.apache.org/jira/browse/HIVE-16166
             Project: Hive
          Issue Type: Improvement
            Reporter: Misha Dmitriev
            Assignee: Misha Dmitriev


A heap dump obtained from one of our users shows that 15% of memory is wasted on duplicate
strings, despite the recent optimizations that I made. The problematic strings just come from
different sources this time. See the excerpt from the jxray (www.jxray.com) analysis attached.

Adding String.intern() calls in the appropriate places reduces the overhead of duplicate strings
with this workload to ~6%. The remaining duplicates come mostly from JDK internal and MapReduce
data structures, and thus are more difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message