hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
Date Wed, 24 Jun 2015 18:10:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599861#comment-14599861

Colin Patrick McCabe commented on HADOOP-12107:

bq. 003 patch is good. I just don't understand why 001 patch creates one additional thread
per FileSystem object? I look at FileSystem#getStaticstics(..). I think it creates one Staticstics
object per FileSystem class, so 001 patch creates one additional thread per FileSystem class?
I'll be grateful if somebody can guide me through this.

D'oh.  You're absolutely right... there is only one thread per FileSystem class.  Forget what
I said earlier.

bq. One other point of discussion: I'm removing the StatisticsData constructor that takes
a Thread as the argument (along with the owner member variable) as it no longer needs the
thread inside StatisticsData. But since StatisticsData is technically a public class, it could
be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using
the constructor or the member variable, and I cannot think of why anyone would. But if we
need to absolutely maintain the API compatibility, I cannot remove them. Let me know what
you think.

StatisticsData is a public class, but its constructor is not public.  So there is no possible
API breakage here.  (As a side note, even if StatisticsData were public, it doesn't have an
interface annotation specifying that it is safe to use outside of Hadoop, so we're doubly
safe here.)

2921              /**
2922	       * This constructor is deprecated and is no longer used. One should remove
2923	       * any use of this constructor.
2924	       */
2925	      @Deprecated
2926	      StatisticsData(WeakReference<Thread> owner) {

Like I explained above, we don't need this.  Let's get rid of it.

StatisticsDataReference: let's put an {{\@Override}} annotation on the functions which implement
{{PhantomReference}} APIs.

142	    final int maxSeconds = 10;
143	    for (int i = 0; i < maxSeconds; i++) {
144	      Thread.sleep(1000L);
145	      allDataSize = stats.getAllThreadLocalDataSize();
146	      if (allDataSize == 0) {
147	        LOG.info("cleaned up after " + (i+1) + " seconds");
148	        break;
149	      } else {
150	        LOG.info("not cleaned up after " + (i+1) + " seconds; waiting...");
151	      }
152	    }

Let's use {{GenericTestUtils#waitFor}} here.

+1 once those are addressed.  Thanks, [~sjlee0].

> long running apps may have a huge number of StatisticsData instances under FileSystem
> -------------------------------------------------------------------------------------
>                 Key: HADOOP-12107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12107
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Minor
>         Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch,
> We observed with some of our apps (non-mapreduce apps that use filesystems) that they
end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}}
(in the {{allData}} list of {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference, and thus can
get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't
get cleared until any of these following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem but does
not call any of these methods on the {{Statistics}}. If such an application runs for a long
time and has a large amount of thread churn, the memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these operations
occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}}
itself in that the memory is controlled only as a side effect of those operations.

This message was sent by Atlassian JIRA

View raw message