hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-291) Hadoop Log Archiver/Analyzer utility
Date Thu, 08 Jun 2006 10:41:29 GMT
Hadoop Log Archiver/Analyzer utility
------------------------------------

         Key: HADOOP-291
         URL: http://issues.apache.org/jira/browse/HADOOP-291
     Project: Hadoop
        Type: New Feature

  Components: util  
    Reporter: Arun C Murthy


Overview of the log archiver/analyzer utility...

1. Input
  The tool takes as input a list of directory URLs, each url could also we associated with
a file-pattern to specify what pattern of files in that directory are to be used.
  e.g. http://g1015:50030/logs/hadoop-sameer-jobtracker-*
         file:///export/crawlspace/sanjay/hadoop/trunk/run/logs/haddop-sanjay-namenode-* (local
disk on the machine on which the job was submitted)

2. The tool supports 2 main functions:

  a) Archival
    Archive the logs in the DFS in the following hierarchy:
   /users/<username>/log-archive/YYYY/mm/dd/HHMMSS.log by default 
   Or a user-specified directory and then: 
   <input-dir>/YYYY/mm/dd/HHMMSS.log

  b) Processing with simple sort/grep primitives
    Archive the logs as above and then grep for lines with given pattern (e.g. INFO) and then
sort with spec e.g. <logger><level><date>. (Note: This is proposed with
current log4j based logging in mind... do we need anything more generic?). The sort/grep specs
are user-provided; along with directory URLs.

3. Thoughts on implementation...

  a) Archival
    Current idea is to put a .jsp page (src/webapps) on each of the nodes; which then does
a *copyFromLocal* of the log-file into the DFS. The jobtracker will fire n map-tasks which
only hit the jsp page as per the directory URLs. The reduce-task is a no-op and only collects
statistics on failures (if any).

  b) Processing with sort/grep
    Here, the tool first archives the files as above and then another set of map-reduce tasks
will do the sort/grep on the files in DFS with given specs.


                                                                                         
- * - * - 

 Suggestions/corrections welcome...

thanks,
Arun

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message