hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14168) Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances
Date Sat, 09 Jul 2016 00:09:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368780#comment-15368780
] 

Sergey Shelukhin commented on HIVE-14168:
-----------------------------------------

I was recently looking at that code. That was rather surprising and sometimes actually results
in unexpected behavior (e.g. schematool tries to get connection settings ensuring they are
set, without the default values, but with this magic map the default values are returned to
it anyway since they are explicitly added to configuration).
I think it would be a good idea to remove this, but I didn't have time then to investigate
in detail. Looking at the history of this feature may shed some light on why this is done.
Also there may be code that relies on this behavior unwittingly, but I think we should fix
it after removing the map rather than looking for it in advance.

> Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-14168
>                 URL: https://issues.apache.org/jira/browse/HIVE-14168
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Priority: Critical
>
> All non-null parameters from HiveConf.java are explicitly set in each HiveConf instance.
> {code}
> // Overlay the ConfVars. Note that this ignores ConfVars with null values
>     addResource(getConfVarInputStream());
> {code}
> This unnecessarily bloats each Configuration object - 400+ conf variables being set instead
of probably <30 which would exist in hive-site.xml.
> Looking at a HS2 heapdump - HiveConf is almost always the largest component by a long
way. Conf objects are also serialized very often - transmitting lots of unneeded variables
(serialized Hive conf is typically 1000+ variables - due to Hadoop injecting it's configs
into every config instance).
> As long as HiveConf.get() is the approach used to read from a config - this is avoidable.
Hive code itself should be doing this.
> This would be a potentially incompatible change for UDFs and other plugins which have
access to a Configuration object.
> I'd suggest turning off the insert by default, and adding a flag to control this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message