hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Created] (HIVE-18252) Limit the size of the object inspector caches
Date Fri, 08 Dec 2017 03:50:00 GMT
Jason Dere created HIVE-18252:

             Summary: Limit the size of the object inspector caches
                 Key: HIVE-18252
             Project: Hive
          Issue Type: Bug
          Components: Types
            Reporter: Jason Dere
            Assignee: Jason Dere

Was running some tests that had a lot of queries with constant values, and noticed that ObjectInspectorFactory.cachedStandardStructObjectInspector
started using up a lot of memory.

It appears that StructObjectInspector caching does not work properly with constant values.
Constant ObjectInspectors are not cached, so each constant expression creates a new constant
ObjectInspector. And since object inspectors do not override equals(), object inspector comparison
relies on object instance comparison. So even if the values are exactly the same as what is
already in the cache, the StructObjectInspector cache lookup would fail, and Hive would create
a new object inspector and add it to the cache, creating another entry that would never be
used. Plus, there is no max cache size - it's just a map that is allowed to grow as long as
values keep getting added to it.

Some possible solutions I can think of:
1. Limit the size of the object inspector caches, rather than growing without bound.
2. Try to fix the caching to work with constant values. This would require implementing equals()
on the constant object inspectors (which could be slow in nested cases), or else we would
have to start caching constant object inspectors, which could be expensive in terms of memory
usage. Could be used in combination with (1). By itself this is not a great solution because
this still has the unbounded cache growth issue.
3. Disable caching in the case of constant object inspectors since this scenario currently
doesn't work. This could be used in combination with (1).

This message was sent by Atlassian JIRA

View raw message