hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10713) Throttle FsNameSystem lock warnings
Date Wed, 31 Aug 2016 21:42:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453359#comment-15453359
] 

Mingliang Liu edited comment on HDFS-10713 at 8/31/16 9:42 PM:
---------------------------------------------------------------

One concern is that do we need to dump the longest lock interval information during the suppressed
interval, including lock-holding interval and its thread stack. This should reveal more useful
information. One extreme example is a case where two threads (t1 and t2) holding the write
lock in a sequence: *t1-1s, t2-100s, t1-1s*, in the current implementation the t2 information
will be missing though it's more interesting.

{code:title=DFSConfigKeys.java}
414	  // Threshold for how long the write lock warnings must be suppressed
415	  public static final String DFS_LOCK_SUPPRESS_WARNING_INTERVAL_MS_KEY =
416	      "dfs.lock.suppress.warning.interval.ms";
417	  public static final long DFS_LOCK_SUPPRESS_WARNING_INTERVAL_MS_DEFAULT =
418	      120000L;
{code}
And
{code:title=hdfs-default.xml}
	2626	<property>
2627	  <name>dfs.lock.suppress.warning.interval.ms</name>
2628	  <value>1000</value>
2629	  <description>The interval between reporting lock warnings.
2630	  </description>
2631	</property>
2632	
{code}
I believe the default value of config key {{dfs.lock.suppress.warning.interval.ms}} is 2 mins
not 1 second?

Minor comments:
# In line 1571, message {{"Number of suppressed write-lock reports: " + numSuppressedWarnings);}}
should have a "\n" or "\t" before it.
# Let's make {{private final long writeLockReportingThreshold;}} and {{private final long
writeLockSuppressWarningInterval;}} final.

As the following work, [~jingzhao] also suggest we have a look at the feasibility to expose
this information to nntop metrics.


was (Author: liuml07):
{code:title=DFSConfigKeys.java}
414	  // Threshold for how long the write lock warnings must be suppressed
415	  public static final String DFS_LOCK_SUPPRESS_WARNING_INTERVAL_MS_KEY =
416	      "dfs.lock.suppress.warning.interval.ms";
417	  public static final long DFS_LOCK_SUPPRESS_WARNING_INTERVAL_MS_DEFAULT =
418	      120000L;
{code}
And
{code:title=hdfs-default.xml}
	2626	<property>
2627	  <name>dfs.lock.suppress.warning.interval.ms</name>
2628	  <value>1000</value>
2629	  <description>The interval between reporting lock warnings.
2630	  </description>
2631	</property>
2632	
{code}
I believe the default value of config key {{dfs.lock.suppress.warning.interval.ms}} is 2 mins
not 1 second?

Minor comments:
In line 1571, message {{"Number of suppressed write-lock reports: " + numSuppressedWarnings);}}
should have a "\n" or "\t" before it.

> Throttle FsNameSystem lock warnings
> -----------------------------------
>
>                 Key: HDFS-10713
>                 URL: https://issues.apache.org/jira/browse/HDFS-10713
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: logging, namenode
>            Reporter: Arpit Agarwal
>            Assignee: Hanisha Koneru
>         Attachments: HDFS-10713.000.patch, HDFS-10713.001.patch, HDFS-10713.002.patch
>
>
> The NameNode logs a message if the FSNamesystem write lock is held by a thread for over
1 second. These messages can be throttled to at one most one per x minutes to avoid potentially
filling up NN logs. We can also log the number of suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message