hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <>
Subject [jira] [Commented] (HIVE-20192) HS2 with embedded metastore is leaking JDOPersistenceManager objects.
Date Fri, 20 Jul 2018 06:43:00 GMT


Sankar Hariappan commented on HIVE-20192:

Thanks for the feedback [~vihangk1]!


I see that in the initializeHelper method if there is a exception you are issuing a shutdown
on the ObjectStore to clean up persistenceManager but shouldn't this uncaught exception cause
the thread to be closed in the first place and thereby cleaning up the threadlocal rawstore


- The PersistenceManagerFactory object "pmf" is a static object which keeps references of
the allocated PersistenceManager in pmCache Map. That's why PersistenceManager doesn't get
GC'ed and need explicit shutdown for any exception. In this case we retry instead of closing
the thread which overwrites the pm object and leaks the old one.


Is it better to issue a shutdown on the threadlocal rawstore from {{ThriftBinaryCLIService#deleteContext}}
method instead?


- That's a good point. But, I'm not sure if there is any reason for keeping the current implementation
with  threadRawStoreMap.


Based on my understanding it looks like we are trying to keep track of the threadlocal rawstore
using custom implementation of Thread in a map and depend on finalize method to do cleanup.
This in theory means that cleanup is only happening when the threads are GCed instead of doing
it as soon as when sessions are closed. Also, if a thrift thread is reused there would already
be an entry in the {{threadRawStoreMap}} and {{cacheThreadLocalRawStore}} will overwrite that
entry which can also cause a leak. This can potentially be verified by keeping the min threads
and max threads as equal (so no thread is ever GCed) you keep opening and closing connections
to HMS, eventually these threadLocalRawstore should pile up.


- I think, overwriting the entry by cacheThreadLocalRawStore doesn't cause any leak, because,
it overwrites with thread local rawStore which is active in this thread. If the thread local
rawStore is changed, it means, the older one was already shutdown gracefully before re-create.
Also, threadRawStoreMap shouldn't pile up as we use the same thread id. 

Please let me know if I miss anything.

> HS2 with embedded metastore is leaking JDOPersistenceManager objects.
> ---------------------------------------------------------------------
>                 Key: HIVE-20192
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0, 3.1.0, 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: HiveServer2, pull-request-available
>             Fix For: 4.0.0
>         Attachments: HIVE-20192.01.patch
> Hiveserver2 instances where crashing every 3-4 days and observed HS2 in on unresponsive
state. Also, observed that the FGC collection happening regularly
> From JXray report it is seen that pmCache(List of JDOPersistenceManager objects) is occupying
84% of the heap and there are around 16,000 references of UDFClassLoader.
> {code:java}
> 10,759,230K (84.7%) Object tree for GC root(s) Java Static org.apache.hadoop.hive.metastore.ObjectStore.pmf
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.pmCache ↘ 10,744,419K (84.6%),
1 reference(s)
>   - j.u.Collections$SetFromMap.m ↘ 10,744,419K (84.6%), 1 reference(s)
>     - {java.util.concurrent.ConcurrentHashMap}.keys ↘ 10,743,764K (84.5%), 16,872 reference(s)
>       - ↘ 10,738,831K (84.5%), 16,872
>         ... 3 more references together retaining 4,933K (< 0.1%)
>     - java.util.concurrent.ConcurrentHashMap self 655K (< 0.1%), 1 object(s)
>       ... 2 more references together retaining 48b (< 0.1%)
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.nucleusContext ↘ 14,810K (0.1%),
1 reference(s)
> ... 3 more references together retaining 96b (< 0.1%){code}
> When the RawStore object is re-created, it is not allowed to be updated into the ThreadWithGarbageCleanup.threadRawStoreMap
which leads to the new RawStore never gets cleaned-up when the thread exit.

This message was sent by Atlassian JIRA

View raw message