hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <>
Subject [jira] [Updated] (HIVE-14168) Avoid serializing all parameters from into in-memory HiveConf instances
Date Wed, 06 Jul 2016 06:56:11 GMT


Siddharth Seth updated HIVE-14168:
    Issue Type: Improvement  (was: Bug)

> Avoid serializing all parameters from into in-memory HiveConf instances
> -------------------------------------------------------------------------------------
>                 Key: HIVE-14168
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Priority: Critical
> All non-null parameters from are explicitly set in each HiveConf instance.
> {code}
> // Overlay the ConfVars. Note that this ignores ConfVars with null values
>     addResource(getConfVarInputStream());
> {code}
> This unnecessarily bloats each Configuration object - 400+ conf variables being set instead
of probably <30 which would exist in hive-site.xml.
> Looking at a HS2 heapdump - HiveConf is almost always the largest component by a long
way. Conf objects are also serialized very often - transmitting lots of unneeded variables
(serialized Hive conf is typically 1000+ variables - due to Hadoop injecting it's configs
into every config instance).
> As long as HiveConf.get() is the approach used to read from a config - this is avoidable.
Hive code itself should be doing this.
> This would be a potentially incompatible change for UDFs and other plugins which have
access to a Configuration object.
> I'd suggest turning off the insert by default, and adding a flag to control this.

This message was sent by Atlassian JIRA

View raw message