hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.
Date Thu, 06 Jul 2006 06:35:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419414 ] 

Arun C Murthy commented on HADOOP-342:
--------------------------------------

Summary of logalyzer usage:

Logalyzer.0.0.1
Usage: 
Logalyzer [-archive -logs urlsFile>] -archiveDir <archiveDirectory> -grep <pattern>
-sort <column1,column2,...> -separator <separator> -analysis <outputDirectory>

Usage Scenarios:
---------------------------

a) Archive only:

$ java org.apache.hadoop.tools.Logalyzer -archive -logs <urlsFile> -archiveDir <archiveDirectory>

 Fetch the logs specified in <urlsFile> (arbitrary combination of dfs & http based
logs) and archive it in <archiveDirectory> (in the dfs).

  Archival of logs from diverse sources is accomplished using the *distcp* tool (HADOOP-341).



b) Analyse only:
 
 $ java org.apache.hadoop.tools.Logalyzer -archiveDir <archiveDirectory> -grep <pattern>
-sort <column1,column2,...> -separator <separator> -analysis <outputDirectory>

  Analyse the logs in <archiveDirectory> i.e. grep/sort-with-separator and store the
output (as a single textfile) of 'analysis' in <outputDirectory>.

  This is accomplished via a Map-Reduce task where the map does the *grep* for the given pattern
via RegexMapper and then the implicit *sort* (reducer) is used with a custom Comparator which
performs the user-specified comparision (columns).


c) Archive and analyse

  $ java org.apache.hadoop.tools.Logalyzer -archive -logs <urlsFile> -archiveDir <archiveDirectory>
-grep <pattern> -sort <column1,column2,...> -separator <separator> -analysis
<outputDirectory>
 
  Perform both a) and b) tasks.

       - * - * -

Arun

> Design/Implement a tool to support archival and analysis of logfiles.
> ---------------------------------------------------------------------
>
>          Key: HADOOP-342
>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Arun C Murthy
>  Attachments: logalyzer.patch
>
> Requirements:
>   a) Create a tool support archival of logfiles (from diverse sources) in hadoop's dfs.
>   b) The tool should also support analysis of the logfiles via grep/sort primitives.
The tool should allow for fairly generic pattern 'grep's and let users 'sort' the matching
lines (from grep) on 'columns' of their choice.
>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them based on timestamps
(column x)  and then on column y (column x, followed by column y).
> Design/Implementation:
>   a) Log Archival
>     Archival of logs from diverse sources can be accomplished using the *distcp* tool
(HADOOP-341).
>   
>   b) Log analysis
>     The idea is to enable users of the tool to perform analysis of logs via grep/sort
primitives.
>     This can be accomplished via a relatively simple Map-Reduce task where the map does
the *grep* for the given pattern via RegexMapper and then the implicit *sort* (reducer) is
used with a custom Comparator which performs the user-specified comparision (columns). 
>     The sort/grep specs can be fairly powerful by letting the user of the tool use java's
in-built regex patterns (java.util.regex).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message