hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-211) logging improvements for Hadoop
Date Fri, 12 May 2006 06:56:10 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-211?page=comments#action_12379176 ] 

eric baldeschwieler commented on HADOOP-211:

I suggest we use iso8601 time format.


This would suggest yyyy-MM-ddTHH:mm:SS , such as 2006-05-11T23:47:03

The T is a literal and no one ever likes it.  Change it for all I care, but standards are
ok.  This also suggests UTC, which I think is a good default, but also allows for local time,
with a distinct notation 2006-05-11T23:47:03-08.  We could support that as a config option
if folks care.

This format is also directly sortable, which is nice and avoids localization issues (MM-dd
or dd-MM).

> logging improvements for Hadoop
> -------------------------------
>          Key: HADOOP-211
>          URL: http://issues.apache.org/jira/browse/HADOOP-211
>      Project: Hadoop
>         Type: Improvement

>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Sameer Paranjpye
>     Priority: Minor
>      Fix For: 0.3

> Here's a proposal for some impovements to the way Hadoop does logging. It advocates 3

> broad changes to the way logging is currently done, these being:
> - The use of a uniform logging format by all Hadoop subsystems
> - The use of Apache commons logging as a facade above an underlying logging framework
> - The use of Log4J as the underlying logging framework instead of java.util.logging
> This is largely polishing work, but it seems like it would make log analysis and debugging
> easier in the short term. In the long term, it would future proof logging to the extent
> allowing the logging framework used to change while requiring minimal code change. The

> propos changes are motivated by the following requirements which we think Hadoops 
> logging should meet:
> - Hadoops logs should be amenable to analysis by tools like grep, sed, awk etc.
> - Log entries should be clearly annotated with a timestamp and a logging level
> - Log entries should be traceable to the subsystem from which they originated
> - The logging implementation should allow log entries to be annotated with source code

> location information like classname, methodname, file and line number, without requiring
> code changes
> - It should be possible to change the logging implementation used without having to change
> thousands of lines of code
> - The mapping of loggers to destinations (files, directories, servers etc.) should be

> specified and modifiable via configuration
> Uniform logging format:
> All Hadoop logs should have the following structure.
> <Header>\n
> <LogEntry>\n [<Exception>\n]
> .
> .
> .
> where the header line specifies the format of each log entry. The header line has the
> '# <Fieldname> <Fieldname>...\n'. 
> The default format of each log entry is: '# Timestamp Level LoggerName Message', where:
> - Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
> - Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
> - LoggerName is the short name of the logging subsystem from which the message originated
> fs.FSNamesystem, dfs.Datanode etc.
> - Message is the log message produced
> Why Apache commons logging and Log4J?
> Apache commons logging is a facade meant to be used as a wrapper around an underlying
> implementation. Bridges from Apache commons logging to popular logging implementations

> (Java logging, Log4J, Avalon etc.) are implemented and available as part of the commons
> distribution. Implementing a bridge to an unsupported implementation is fairly striaghtforward
> and involves the implementation of subclasses of the commons logging LogFactory and Logger

> classes. Using Apache commons logging and making all logging calls through it enables
us to
> move to a different logging implementation by simply changing configuration in the best
> Even otherwise, it incurs minimal code churn overhead.
> Log4J offers a few benefits over java.util.logging that make it a more desirable choice
for the
> logging back end.
> - Configuration Flexibility: The mapping of loggers to destinations (files, sockets etc.)
> can be completely specified in configuration. It is possible to do this with Java logging
> well, however, configuration is a lot more restrictive. For instance, with Java logging
> log files must have names derived from the same pattern. For the namenode, log files
> be named with the pattern "%h/namenode%u.log" which would put log files in the user.home
> directory with names like namenode0.log etc. With Log4J it would be possible to configure
> the namenode to emit log files with different names, say heartbeats.log, namespace.log,
> clients.log etc. Configuration variables in Log4J can also have the values of system

> properties embedded in them.
> - Takes wrappers into account: Log4J takes into account the possibility that an application
> may be invoking it via a wrapper, such as Apache commons logging. This is important because
> logging event objects must be able to infer the context of the logging call such as classname,
> methodname etc. Inferring context is a relatively expensive operation that involves creating
> an exception and examining the stack trace to find the frame just before the first frame

> of the logging framework. It is therefore done lazily only when this information actually

> needs to be logged. Log4J can be instructed to look for the frame corresponding to the
> class, Java logging cannot. In the case of Java logging this means that a) the bridge
> Apache commons logging is responsible for inferring the calling context and setting it
in the 
> logging event and b) this inference has to be done on every logging call regardless of
> or not it is needed.
> - More handy features: Log4J has some handy features that Java logging doesn't. A couple
> of examples of these:
> a) Date based rolling of log files 
> b) Format control through configuration. Log4J has a PatternLayout class that can be

> configured to generate logs with a user specified pattern. The logging format described
> above can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m". The format specifiers
> indicate that each log line should have the date and time followed by the logger name
> by the logging level or priority followed by the application generated message.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message