hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-211) logging improvements for Hadoop
Date Thu, 18 May 2006 02:56:33 GMT
Once we settle on something, let's post it on the twiki.

On May 17, 2006, at 10:07 AM, Doug Cutting (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-211? 
> page=comments#action_12412213 ]
>
> Doug Cutting commented on HADOOP-211:
> -------------------------------------
>
> The semantics I use for levels is something like:
>
> SEVERE: if this is a production system, someone should be paged,  
> red lights should flash, etc.  Something is definitely wrong and  
> the system is not operating correctly.  Intervention is required.   
> This should be used sparingly.
>
> WARN: in a production system, warnings should be propagated &  
> summarized on a central console.  If lots are generated then  
> something may be wrong.
>
> INFO, FINE, FINER, etc. are used for debugging.  INFO is the level  
> normally logged in production, FINE, FINER, etc. are typically only  
> used when developing.
>
> Is that consistent with the way others use these?
>
>
>> logging improvements for Hadoop
>> -------------------------------
>>
>>          Key: HADOOP-211
>>          URL: http://issues.apache.org/jira/browse/HADOOP-211
>>      Project: Hadoop
>>         Type: Improvement
>
>>     Versions: 0.2
>>     Reporter: Sameer Paranjpye
>>     Assignee: Sameer Paranjpye
>>     Priority: Minor
>>      Fix For: 0.3
>
>>
>> Here's a proposal for some impovements to the way Hadoop does  
>> logging. It advocates 3
>> broad changes to the way logging is currently done, these being:
>> - The use of a uniform logging format by all Hadoop subsystems
>> - The use of Apache commons logging as a facade above an  
>> underlying logging framework
>> - The use of Log4J as the underlying logging framework instead of  
>> java.util.logging
>> This is largely polishing work, but it seems like it would make  
>> log analysis and debugging
>> easier in the short term. In the long term, it would future proof  
>> logging to the extent of
>> allowing the logging framework used to change while requiring  
>> minimal code change. The
>> propos changes are motivated by the following requirements which  
>> we think Hadoops
>> logging should meet:
>> - Hadoops logs should be amenable to analysis by tools like grep,  
>> sed, awk etc.
>> - Log entries should be clearly annotated with a timestamp and a  
>> logging level
>> - Log entries should be traceable to the subsystem from which they  
>> originated
>> - The logging implementation should allow log entries to be  
>> annotated with source code
>> location information like classname, methodname, file and line  
>> number, without requiring
>> code changes
>> - It should be possible to change the logging implementation used  
>> without having to change
>> thousands of lines of code
>> - The mapping of loggers to destinations (files, directories,  
>> servers etc.) should be
>> specified and modifiable via configuration
>> Uniform logging format:
>> All Hadoop logs should have the following structure.
>> <Header>\n
>> <LogEntry>\n [<Exception>\n]
>> .
>> .
>> .
>> where the header line specifies the format of each log entry. The  
>> header line has the format:
>> '# <Fieldname> <Fieldname>...\n'.
>> The default format of each log entry is: '# Timestamp Level  
>> LoggerName Message', where:
>> - Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
>> - Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
>> - LoggerName is the short name of the logging subsystem from which  
>> the message originated e.g.
>> fs.FSNamesystem, dfs.Datanode etc.
>> - Message is the log message produced
>> Why Apache commons logging and Log4J?
>> Apache commons logging is a facade meant to be used as a wrapper  
>> around an underlying logging
>> implementation. Bridges from Apache commons logging to popular  
>> logging implementations
>> (Java logging, Log4J, Avalon etc.) are implemented and available  
>> as part of the commons logging
>> distribution. Implementing a bridge to an unsupported  
>> implementation is fairly striaghtforward
>> and involves the implementation of subclasses of the commons  
>> logging LogFactory and Logger
>> classes. Using Apache commons logging and making all logging calls  
>> through it enables us to
>> move to a different logging implementation by simply changing  
>> configuration in the best case.
>> Even otherwise, it incurs minimal code churn overhead.
>> Log4J offers a few benefits over java.util.logging that make it a  
>> more desirable choice for the
>> logging back end.
>> - Configuration Flexibility: The mapping of loggers to  
>> destinations (files, sockets etc.)
>> can be completely specified in configuration. It is possible to do  
>> this with Java logging as
>> well, however, configuration is a lot more restrictive. For  
>> instance, with Java logging all
>> log files must have names derived from the same pattern. For the  
>> namenode, log files could
>> be named with the pattern "%h/namenode%u.log" which would put log  
>> files in the user.home
>> directory with names like namenode0.log etc. With Log4J it would  
>> be possible to configure
>> the namenode to emit log files with different names, say  
>> heartbeats.log, namespace.log,
>> clients.log etc. Configuration variables in Log4J can also have  
>> the values of system
>> properties embedded in them.
>> - Takes wrappers into account: Log4J takes into account the  
>> possibility that an application
>> may be invoking it via a wrapper, such as Apache commons logging.  
>> This is important because
>> logging event objects must be able to infer the context of the  
>> logging call such as classname,
>> methodname etc. Inferring context is a relatively expensive  
>> operation that involves creating
>> an exception and examining the stack trace to find the frame just  
>> before the first frame
>> of the logging framework. It is therefore done lazily only when  
>> this information actually
>> needs to be logged. Log4J can be instructed to look for the frame  
>> corresponding to the wrapper
>> class, Java logging cannot. In the case of Java logging this means  
>> that a) the bridge from
>> Apache commons logging is responsible for inferring the calling  
>> context and setting it in the
>> logging event and b) this inference has to be done on every  
>> logging call regardless of whether
>> or not it is needed.
>> - More handy features: Log4J has some handy features that Java  
>> logging doesn't. A couple
>> of examples of these:
>> a) Date based rolling of log files
>> b) Format control through configuration. Log4J has a PatternLayout  
>> class that can be
>> configured to generate logs with a user specified pattern. The  
>> logging format described
>> above can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m".  
>> The format specifiers
>> indicate that each log line should have the date and time followed  
>> by the logger name followed
>> by the logging level or priority followed by the application  
>> generated message.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>


Mime
View raw message