hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.
Date Wed, 05 Jul 2006 06:15:12 GMT
yupp.  Perhaps the HTTP approach would support that too.  Long term,  
this is one of the reasons atomic appends would be cool.  That would  
allow us to log in realtime to HDFS.

On Jul 4, 2006, at 2:22 PM, Arkady Borkovsky wrote:

> It would be very nice to  be able to see (analyze) the logs of a  
> task that is still running.
>
> -- ab
>
> On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:
>
>>     [ http://issues.apache.org/jira/browse/HADOOP-342? 
>> page=comments#action_12419011 ]
>>
>> Arun C Murthy commented on HADOOP-342:
>> --------------------------------------
>>
>> Should have clarified this: the plan is to let the user specify an  
>> output directory in which a single text file will contain the  
>> output of the 'analysis'.
>>
>> Generic Sorter:
>>
>>   The generic sorter basically lets the user specify a column  
>> separator and a spec for priority of columns.
>>   The Comparator's *compare* function (implements  
>> WritableComparable) then splits each sequence of data based on  
>> user specified separator and then compares the 2 data streams on  
>> the given priorities.
>>
>>   E.g. -sortColumnSpec 2,0,1 -separator \t
>>   (0-based columns)
>>
>>   If there is enough interest, I can push this into mapred.lib.  
>> Appreciate any suggestions.
>>
>> thanks,
>> Arun
>>
>>
>>> Design/Implement a tool to support archival and analysis of  
>>> logfiles.
>>> -------------------------------------------------------------------- 
>>> -
>>>
>>>          Key: HADOOP-342
>>>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>>>      Project: Hadoop
>>>         Type: New Feature
>>
>>>     Reporter: Arun C Murthy
>>
>>>
>>> Requirements:
>>>   a) Create a tool support archival of logfiles (from diverse  
>>> sources) in hadoop's dfs.
>>>   b) The tool should also support analysis of the logfiles via  
>>> grep/sort primitives. The tool should allow for fairly generic  
>>> pattern 'grep's and let users 'sort' the matching lines (from  
>>> grep) on 'columns' of their choice.
>>>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and  
>>> sort them based on timestamps (column x)  and then on column y  
>>> (column x, followed by column y).
>>> Design/Implementation:
>>>   a) Log Archival
>>>     Archival of logs from diverse sources can be accomplished  
>>> using the *distcp* tool (HADOOP-341).
>>>
>>>   b) Log analysis
>>>     The idea is to enable users of the tool to perform analysis  
>>> of logs via grep/sort primitives.
>>>     This can be accomplished via a relatively simple Map-Reduce  
>>> task where the map does the *grep* for the given pattern via  
>>> RegexMapper and then the implicit *sort* (reducer) is used with a  
>>> custom Comparator which performs the user-specified comparision  
>>> (columns).
>>>     The sort/grep specs can be fairly powerful by letting the  
>>> user of the tool use java's in-built regex patterns  
>>> (java.util.regex).
>>
>> -- 
>> This message is automatically generated by JIRA.
>> -
>> If you think it was sent incorrectly contact one of the  
>> administrators:
>>    http://issues.apache.org/jira/secure/Administrators.jspa
>> -
>> For more information on JIRA, see:
>>    http://www.atlassian.com/software/jira
>>
>


Mime
View raw message