hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3110) NameNode does logging in critical sections just about everywhere
Date Thu, 27 Mar 2008 21:49:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582803#action_12582803
] 

Pete Wyckoff commented on HADOOP-3110:
--------------------------------------

I've seen a case in the JobTracker where a large #of failed tasks at one time can cause the
 JobTracker to lock up because of this. The JT ends up writing hundreds of error messages
to the log and by the time its done, the next set of tasks have failed because they all timed
out waiting for the JobTracker to write the last batch of failures and then so on and on and
on and on...

These are critical sections, so if you have a queue of 200 clients waiting to do an operation,
they each have to wait 200 * Logging Time + the rest of it and during those 200, one will
probably force a flush of the Java buffers to the OS.

We don't have a benchmarking cluster here unfortunately, but I can tell you one place where
I perceive this pain. In the FUSE-dfs mountable filesystem. When there are no DFS clients,
it's lightening fast, but add a few Map/Red jobs and the thing really, *really* slows down.
 The fuse client ends up making multiple calls per file system operation, so these delays
waiting for other processes to relinquish the namenode global lock really hurt it. How much
this contributes to that I don't know, but for real-time clients we know that DFS is pretty
weak right now.




> NameNode does logging in critical sections just about everywhere
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3110
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3110
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.15.0, 0.15.1, 0.15.2,
0.15.3, 0.16.0, 0.16.1
>         Environment: All
>            Reporter: Pete Wyckoff
>
> e.g., FSNameSystem.addStoredBlock (but almost every method has logging in its critical
sections)
> This method is synchronized and it's spitting something out to Log.info every block stored.
Normally not a big deal, but since this is in the name node and these are critical sections...
> We shouldn't even do any logging at all in critical sections, so even the info and warn
are bad.  But, in many places in the code, it would be hard to tease these out (although eventually
they really should be), but the system could start using something like an AsyncAppender and
see how it improves performance. 
> Even though the log may have a buffer, the writing and doing the formatting and stuff
cause a drag on performance with 100s/1000s of machines trying to talk to the name node.
> At a minimum, the most often  triggered Log.info could be changed to Log.debug.
> for reference: http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/AsyncAppender.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message