hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
Date Wed, 01 Aug 2012 18:29:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mithun Radhakrishnan updated HIVE-3098:
---------------------------------------

    Attachment: Hive-3098_(FS_closeAllForUGI()).patch

Good news.

After applying the following change (i.e. to use FS.closeAllForUGI(), as was intended), I
see that the memory footprint doesn't grow as before. The cleanup of FS handles is better.


I left my stress-test running overnight. I see that the memory-footprint does grow when compared
to running with the UGICache, (22MB vs 75MB). But it doesn't grow to the ludicrous proportions
of the past (2GB). I'm going to put the increase down to heap-fragmentation because of new
object-creation (avoidable with the cache).

Review, please?
                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache
UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: Hive-3098_(FS_closeAllForUGI()).patch, Hive_3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle
backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under
24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 1000000 instances of FileSystem,
whose combined retained-mem consumed the entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented such that
the "Subject" member is compared for equality ("=="), and not equivalence (".equals()"). This
causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance
to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670);
so it is unlikely that that implementation can be modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore),
using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm
that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message