hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arkady Borkovsky" <ark...@inktomi.com>
Subject Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.
Date Tue, 04 Jul 2006 21:22:20 GMT
It would be very nice to  be able to see (analyze) the logs of a task  
that is still running.

-- ab

On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:

>     [  
> http://issues.apache.org/jira/browse/HADOOP-342? 
> page=comments#action_12419011 ]
>
> Arun C Murthy commented on HADOOP-342:
> --------------------------------------
>
> Should have clarified this: the plan is to let the user specify an  
> output directory in which a single text file will contain the output  
> of the 'analysis'.
>
> Generic Sorter:
>
>   The generic sorter basically lets the user specify a column  
> separator and a spec for priority of columns.
>   The Comparator's *compare* function (implements WritableComparable)  
> then splits each sequence of data based on user specified separator  
> and then compares the 2 data streams on the given priorities.
>
>   E.g. -sortColumnSpec 2,0,1 -separator \t
>   (0-based columns)
>
>   If there is enough interest, I can push this into mapred.lib.  
> Appreciate any suggestions.
>
> thanks,
> Arun
>
>
>> Design/Implement a tool to support archival and analysis of logfiles.
>> ---------------------------------------------------------------------
>>
>>          Key: HADOOP-342
>>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>>      Project: Hadoop
>>         Type: New Feature
>
>>     Reporter: Arun C Murthy
>
>>
>> Requirements:
>>   a) Create a tool support archival of logfiles (from diverse  
>> sources) in hadoop's dfs.
>>   b) The tool should also support analysis of the logfiles via  
>> grep/sort primitives. The tool should allow for fairly generic  
>> pattern 'grep's and let users 'sort' the matching lines (from grep)  
>> on 'columns' of their choice.
>>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort  
>> them based on timestamps (column x)  and then on column y (column x,  
>> followed by column y).
>> Design/Implementation:
>>   a) Log Archival
>>     Archival of logs from diverse sources can be accomplished using  
>> the *distcp* tool (HADOOP-341).
>>
>>   b) Log analysis
>>     The idea is to enable users of the tool to perform analysis of  
>> logs via grep/sort primitives.
>>     This can be accomplished via a relatively simple Map-Reduce task  
>> where the map does the *grep* for the given pattern via RegexMapper  
>> and then the implicit *sort* (reducer) is used with a custom  
>> Comparator which performs the user-specified comparision (columns).
>>     The sort/grep specs can be fairly powerful by letting the user of  
>> the tool use java's in-built regex patterns (java.util.regex).
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>


Mime
View raw message