hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12707) key of FileSystem inner class Cache contains UGI.hascode which uses the defualt hascode method, leading to the memory leak
Date Thu, 14 Jan 2016 17:52:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098508#comment-15098508
] 

Chris Nauroth commented on HADOOP-12707:
----------------------------------------

bq. ...we could consider using weak references, or at least having some idleness

I thought about weak references at one point but abandoned it for a few reasons:

# Weak references would provide different semantics compared to the current cache implementation.
 Right now, applications can get a {{FileSystem}}, use it, abandon it for a long time (no
longer holding a reference to it), and then come back and still find that instance sitting
in the cache.  With weak references, there would be a race condition if a GC swept away the
instance before the application came back to claim it again.  This could change performance
and load patterns in negative ways if clients need to reconnect sockets.  Whether or not it
would really be problematic in practice is unclear, but there is a ton of ecosystem and application
code that would need testing.  Any such testing would be difficult and perhaps unconvincing,
because it would be dependent on external factors like heap configuration, and ultimately,
the timing of GC is non-deterministic anyway.
# The cache goes beyond just resource management and is actually coupled to some logic that
implements the API contract.  Delete-on-exit becomes problematic.  With weak references, I
suppose we'd have to trigger the deletes from {{finalize}}.  I've experienced a lot of bugs
in other codebases that rely on a finalizer to do significant work, so I have an immediate
aversion to this.  I also worry about tricky interactions with the {{ClientFinalizer}} shutdown
hook.  I guess an alternative could be to somehow "resurrect" a reaped instance that still
has pending delete-on-exit work, but I expect this would be complex.
# Even assuming we do a correct bug-free implementation of delete-on-exit from a finalizer,
it still changes the semantics.  The deletions are triggered from the {{close}} method.  Many
applications don't bother calling {{close}} at all.  In that case, the deletes would happen
during the {{ClientFinalizer}} shutdown hook.  Effectively, these applications expect that
delete-on-exit means delete-on-process-exit.  They might even be calling {{cancelDeleteOnExit}}
to cancel a prior queued delete.  If a weak-referenced instance drops out because of GC, then
the deletes would happen at an unexpected time, all of that prior state would be lost, and
the application has lost its opportunity to call {{cancelDeleteOnExit}}.

An idleness policy would have similar challenges.  It's hard to identify idleness correctly
within the {{FileSystem}} layer, because current applications expect they can come back to
the cache at any arbitrary time and get the same instance again.

Effectively, this cache isn't just a cache.  It's really a set of global variables where clients
expect that they can do stateful interactions.  Pitfalls of global variables, blah blah blah...
 :-)

I don't like the current implementation, but I also don't see a safe way forward that is backwards-compatible.
 If I had my preference, we'd re-implement it with explicit reference counting, require by
contract that all callers call {{close}} when finished, and perhaps try to scrap the delete-on-exit
feature entirely.  There has been pushback on the reference-counting proposal in the past,
but I don't remember the exact JIRA.

Maybe explore backwards-incompatible changes for 3.x?

> key of FileSystem inner class Cache contains UGI.hascode which uses the defualt hascode
method, leading to the memory leak
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12707
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.1
>            Reporter: sunhaitao
>            Assignee: sunhaitao
>
> FileSystem.get(conf) method,By default it will get the fs object from CACHE,But the key
of the CACHE  constains ugi.hashCode, which uses the default hascode method of subject instead
of the hascode method overwritten by subject.
>    @Override
>       public int hashCode() {
>         return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique;
>       }
> In this case, even if same user, if the calll FileSystem.get(conf) twice, two different
key will be created. In long duartion, this will lead to memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message