Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 28122 invoked from network); 12 May 2006 17:50:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 May 2006 17:50:31 -0000 Received: (qmail 44771 invoked by uid 500); 12 May 2006 17:50:31 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 44755 invoked by uid 500); 12 May 2006 17:50:31 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 44746 invoked by uid 99); 12 May 2006 17:50:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2006 10:50:30 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2006 10:50:30 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id ADC3E714291 for ; Fri, 12 May 2006 17:50:08 +0000 (GMT) Message-ID: <32520996.1147456208681.JavaMail.root@brutus> Date: Fri, 12 May 2006 17:50:08 +0000 (GMT+00:00) From: "Sameer Paranjpye (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-211) logging improvements for Hadoop In-Reply-To: <11693488.1147389488398.JavaMail.root@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-211?page=comments#action_12383227 ] Sameer Paranjpye commented on HADOOP-211: ----------------------------------------- Yes, the suggestions about formats are meant to be defaults. This is one more reason for using Log4J, it gives you a fair amount of freedom with specifying formats in configuration. > logging improvements for Hadoop > ------------------------------- > > Key: HADOOP-211 > URL: http://issues.apache.org/jira/browse/HADOOP-211 > Project: Hadoop > Type: Improvement > Versions: 0.2 > Reporter: Sameer Paranjpye > Assignee: Sameer Paranjpye > Priority: Minor > Fix For: 0.3 > > Here's a proposal for some impovements to the way Hadoop does logging. It advocates 3 > broad changes to the way logging is currently done, these being: > - The use of a uniform logging format by all Hadoop subsystems > - The use of Apache commons logging as a facade above an underlying logging framework > - The use of Log4J as the underlying logging framework instead of java.util.logging > This is largely polishing work, but it seems like it would make log analysis and debugging > easier in the short term. In the long term, it would future proof logging to the extent of > allowing the logging framework used to change while requiring minimal code change. The > propos changes are motivated by the following requirements which we think Hadoops > logging should meet: > - Hadoops logs should be amenable to analysis by tools like grep, sed, awk etc. > - Log entries should be clearly annotated with a timestamp and a logging level > - Log entries should be traceable to the subsystem from which they originated > - The logging implementation should allow log entries to be annotated with source code > location information like classname, methodname, file and line number, without requiring > code changes > - It should be possible to change the logging implementation used without having to change > thousands of lines of code > - The mapping of loggers to destinations (files, directories, servers etc.) should be > specified and modifiable via configuration > Uniform logging format: > All Hadoop logs should have the following structure. >
\n > \n [\n] > . > . > . > where the header line specifies the format of each log entry. The header line has the format: > '# ...\n'. > The default format of each log entry is: '# Timestamp Level LoggerName Message', where: > - Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS > - Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.) > - LoggerName is the short name of the logging subsystem from which the message originated e.g. > fs.FSNamesystem, dfs.Datanode etc. > - Message is the log message produced > Why Apache commons logging and Log4J? > Apache commons logging is a facade meant to be used as a wrapper around an underlying logging > implementation. Bridges from Apache commons logging to popular logging implementations > (Java logging, Log4J, Avalon etc.) are implemented and available as part of the commons logging > distribution. Implementing a bridge to an unsupported implementation is fairly striaghtforward > and involves the implementation of subclasses of the commons logging LogFactory and Logger > classes. Using Apache commons logging and making all logging calls through it enables us to > move to a different logging implementation by simply changing configuration in the best case. > Even otherwise, it incurs minimal code churn overhead. > Log4J offers a few benefits over java.util.logging that make it a more desirable choice for the > logging back end. > - Configuration Flexibility: The mapping of loggers to destinations (files, sockets etc.) > can be completely specified in configuration. It is possible to do this with Java logging as > well, however, configuration is a lot more restrictive. For instance, with Java logging all > log files must have names derived from the same pattern. For the namenode, log files could > be named with the pattern "%h/namenode%u.log" which would put log files in the user.home > directory with names like namenode0.log etc. With Log4J it would be possible to configure > the namenode to emit log files with different names, say heartbeats.log, namespace.log, > clients.log etc. Configuration variables in Log4J can also have the values of system > properties embedded in them. > - Takes wrappers into account: Log4J takes into account the possibility that an application > may be invoking it via a wrapper, such as Apache commons logging. This is important because > logging event objects must be able to infer the context of the logging call such as classname, > methodname etc. Inferring context is a relatively expensive operation that involves creating > an exception and examining the stack trace to find the frame just before the first frame > of the logging framework. It is therefore done lazily only when this information actually > needs to be logged. Log4J can be instructed to look for the frame corresponding to the wrapper > class, Java logging cannot. In the case of Java logging this means that a) the bridge from > Apache commons logging is responsible for inferring the calling context and setting it in the > logging event and b) this inference has to be done on every logging call regardless of whether > or not it is needed. > - More handy features: Log4J has some handy features that Java logging doesn't. A couple > of examples of these: > a) Date based rolling of log files > b) Format control through configuration. Log4J has a PatternLayout class that can be > configured to generate logs with a user specified pattern. The logging format described > above can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m". The format specifiers > indicate that each log line should have the date and time followed by the logger name followed > by the logging level or priority followed by the application generated message. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira