incubator-hcatalog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HCATALOG-183) Memory leak in HCat 0.1/0.2
Date Fri, 16 Dec 2011 07:41:30 GMT

    [ https://issues.apache.org/jira/browse/HCATALOG-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170814#comment-13170814
] 

Mithun Radhakrishnan commented on HCATALOG-183:
-----------------------------------------------

It looks like the leak is in libthrift. (I'll be raising a JIRA against thrift shortly.)

I took a runtime-snapshot of the hcatalog-server process and examined the contents. A majority
of memory seems to have been taken up by org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap.
(This is a WeakHashMap, mapping TTransport objects to their wrapped TSaslServerTransport instances.)
The snapshot from the hcat_server in Yahoo-production indicated that transportMap instance
had 52000+ instances, occupying a shallow-heap size of 3MB, while simultaneously costing a
retained-heap size of 1.3GB. So it looks like the WeakHashMap$Entry objects aren't being GCed.

>From the code in TSaslTransport and TSaslServerTransport, the following might be why the
objects persist:
1. TSaslTransport contains a hard-reference to TTransport (i.e. the object it is wrapping).
2. TSaslServerTransport$Factory contains a (static) WeakHashMap< TTransport, TSaslServerTransport
>.

The Java-runtime attempts to aggressively collect entries within a WeakHashMap that have no
outstanding references.

The problem is that in this WeakHashMap<Key, Value>, the value has a hard-reference
back to the key. (From #1.) While this is a cyclic reference, I think it isn't an obvious
one to the runtime. And since the key can't be be collected, the entry persists for all time.

I've verified this behaviour with sample-code, on JRE1.6. I've also verified that changing
the Value's back-reference to the Key into a WeakReference<Key>, the cycle can be made
explicit to the runtime. I'll make that available in the thrift-JIRA.

Please note that this is not exclusive to libthrift 0.5.0 (on which HCatalog depends). The
code persists on 0.7.0 (on which Hive-trunk runs) and on 0.9 (the latest release). If this
fix is accepted, we might have to consider depending on a newer thrift-library.
                
> Memory leak in HCat 0.1/0.2
> ---------------------------
>
>                 Key: HCATALOG-183
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-183
>             Project: HCatalog
>          Issue Type: Bug
>          Components: metastore
>    Affects Versions: 0.2
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, thrift
>
> When one leaves the HCatalog server running for long (in a secure setup), with requests
continuously coming in, one sees that the memory footprint of the metastore-server increases
continuously, until it culminates in an OutOfMemoryError:
> <backtrace>
> 2011-12-01 18:11:00,620 ERROR api.ThriftHiveMetastore$Processor (ThriftHiveMetastore.java:process(5949))
- Internal error processing get_partition_names
> java.lang.OutOfMemoryError: Java heap space 
>   at java.util.Arrays.copyOf(Arrays.java:2882)
>   at java.lang.StringValue.from(StringValue.java:24)
>   at java.lang.String.<init>(String.java:178)
>   at com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:286)
>   at com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:262)
>   at com.mysql.jdbc.ResultSet.getStringInternal(ResultSet.java:5671)
>   at com.mysql.jdbc.ResultSet.getString(ResultSet.java:5544)
>   at org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:213)
>   at org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:213)
>   at org.datanucleus.store.rdbms.mapping.CharRDBMSMapping.getObject(CharRDBMSMapping.java:460)
>   at org.datanucleus.store.mapped.mapping.SingleFieldMapping.getObject(SingleFieldMapping.java:216)
>   at org.datanucleus.store.rdbms.query.ResultClassROF.processScalarExpression(ResultClassROF.java:583)
>   at org.datanucleus.store.rdbms.query.ResultClassROF.getObject(ResultClassROF.java:361)
>   at org.datanucleus.store.rdbms.query.legacy.LegacyForwardQueryResult.nextResultSetElement(LegacyForwardQueryResult.java:137)
>   at org.datanucleus.store.rdbms.query.legacy.LegacyForwardQueryResult$QueryResultIterator.next(LegacyForwardQueryResult.java:305)
>   at org.apache.hadoop.hive.metastore.ObjectStore.listPartitionNames(ObjectStore.java:1200)
>   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$26.run(HiveMetaStore.java:1555)
>   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$26.run(HiveMetaStore.java:1552)
>   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:309)
>   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_names(HiveMetaStore.java:1552)
>   ...
> </backtrace>
> The OOM is preceded by other failures, including a "GSS initiate failure" (in spite of
a client-side kinit), and an "Error occurred during processing of request".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message