hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.
Date Thu, 06 Jul 2006 05:15:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-342?page=comments#action_12419402 ] 

Arun C Murthy commented on HADOOP-342:

I concur with the need for (optional?) HTTP based map input... I'll start on it. 
(I have some ideas about generalising this infrastructure, which I'm in the process of compiling
and will send it over to a separate email).

Eric: Apologise for not clarifying this earlier: logalyzer (as-is) can be used in either mode
independently or together i.e. it can be used either for archival or analysis (assuming logs
are already in a given directory) or both.

Doug: Can we get logalyzer as-is into the tree right-away and meanwhile I'll get on to the
HTTP-base map input enhancement? There is some interest for using it right-away... hope it
isn't too much of a problem.


> Design/Implement a tool to support archival and analysis of logfiles.
> ---------------------------------------------------------------------
>          Key: HADOOP-342
>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Arun C Murthy
>  Attachments: logalyzer.patch
> Requirements:
>   a) Create a tool support archival of logfiles (from diverse sources) in hadoop's dfs.
>   b) The tool should also support analysis of the logfiles via grep/sort primitives.
The tool should allow for fairly generic pattern 'grep's and let users 'sort' the matching
lines (from grep) on 'columns' of their choice.
>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them based on timestamps
(column x)  and then on column y (column x, followed by column y).
> Design/Implementation:
>   a) Log Archival
>     Archival of logs from diverse sources can be accomplished using the *distcp* tool
>   b) Log analysis
>     The idea is to enable users of the tool to perform analysis of logs via grep/sort
>     This can be accomplished via a relatively simple Map-Reduce task where the map does
the *grep* for the given pattern via RegexMapper and then the implicit *sort* (reducer) is
used with a custom Comparator which performs the user-specified comparision (columns). 
>     The sort/grep specs can be fairly powerful by letting the user of the tool use java's
in-built regex patterns (java.util.regex).

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message