hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunitha Beeram (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used
Date Sat, 08 Jul 2017 06:58:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079001#comment-16079001
] 

Sunitha Beeram commented on HIVE-16844:
---------------------------------------

Thanks [~mithun] for the review.

I am still getting familiar with the Hive code base, so do pardon my ignorance (and the length
of this comment)! Later in this comment I have more notes on the connection leak itself, but
essentially you are right - we are pulling the plug on the connections while another thread
is possibly using it.
Also, on closer inspection of the code/comments I came across [this|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L8344]:
{quote}
   * The NucleusContext cache gets freed up only on calling a close on it.
   * We're not closing NucleusContext since it does a bunch of other things which we don't
want.
{quote}

It looks like a decision was made earlier to not close...

I am not sure yet if there is a better way to handle it, however, there are a few questions
that pop up:
- When the pmf reference is cleared, it seems like the assumption is that the earlier connection
pool is GC'ed - is that right? It looks like thats not occurring in our case. We are using
BoneCP(!), but I do recollect trying HikariCP as well and the problem was reproducible. However,
I did not probe further to see if Hikari offers idle connection cleanup etc. My unvalidated
assumption is that we have min connections in the pool as 1 and with no idle connection reclamation,
we have active references thats preventing GC...
- Another way of looking at the issue, albeit contrived, is that the connection being used
by a thread can become invalid if the remote end closes it for whatever reason and the application
should deal with it gracefully - and the connection pool getting closed is a degenerate instance
of this. So, if we have code similar to what we have in RetryingHMSHandler, it would be handle
the state invalidation gracefully, right?
- Another issue with not closing the pool is that we now have parts of code using different
JDO config while the other thread(s) could continue to use older config indefinitely. Thats
perhaps not desirable either, right?

Early last week I had spent some time looking through the code a bit more, as up until then
I wasn't sure what execution paths actually led to the leak. I believe one possibility is
this:
a) Most calls to the Metastore go through RetryingHMSHandler that has the logic to retry and
reload the configuration in case of errors. So, this code path sees new configuration if one
is available soon after a DB error.
b) Calls going through DBTokenStore, access RawStore directly and hence not only miss out
on the retrying logic, but also continue to use old configuration through a HiveConf object
instantiated at startup.

It looks like in our case, with frequent calls to getDelegationToken, we end up with cases
where (a) and (b) interleave, resulting in connection pool objects getting leaked.

I am just beginning to catch up on the implementation of HCatClientHMSImpl/HCatUtil and will
need to look through a bit more through that code on the JDO configuration that will ultimately
get used etc.

I do appreciate any comments/pointers you have regarding this. Also, let me know if you think
its better to revert the change until this is sorted out.



> Fix Connection leak in ObjectStore when new Conf object is used
> ---------------------------------------------------------------
>
>                 Key: HIVE-16844
>                 URL: https://issues.apache.org/jira/browse/HIVE-16844
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sunitha Beeram
>            Assignee: Sunitha Beeram
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) connection pools
when a new configuration object is passed in. The code needs to ensure that the persistence-factory
is closed before it is nullified.
> The relevant code is [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
Note that pmf is set to null, but the underlying connection pool is not closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message