hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Dahiya <sanj...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-342) Design/Implement a tool to support archival and analysis of logfiles.
Date Wed, 05 Jul 2006 07:39:02 GMT

On 05-Jul-06, at 11:45 AM, Eric Baldeschwieler wrote:

> yupp.  Perhaps the HTTP approach would support that too.  Long  
> term, this is one of the reasons atomic appends would be cool.   
> That would allow us to log in realtime to HDFS.

For now, I am working on Log4J extension for automatically rolling  
the older logs ( based on time and size ) to a well defined dir  
structure in HDFS. Same needs to be done for Map Reduce jobs where  
all logs generated by the jobs go in a directory ( different for each  
MR job ). This can also be used along with log analysis tools.

~Sanjay

>
> On Jul 4, 2006, at 2:22 PM, Arkady Borkovsky wrote:
>
>> It would be very nice to  be able to see (analyze) the logs of a  
>> task that is still running.
>>
>> -- ab
>>
>> On Jul 3, 2006, at 12:39 PM, Arun C Murthy (JIRA) wrote:
>>
>>>     [ http://issues.apache.org/jira/browse/HADOOP-342? 
>>> page=comments#action_12419011 ]
>>>
>>> Arun C Murthy commented on HADOOP-342:
>>> --------------------------------------
>>>
>>> Should have clarified this: the plan is to let the user specify  
>>> an output directory in which a single text file will contain the  
>>> output of the 'analysis'.
>>>
>>> Generic Sorter:
>>>
>>>   The generic sorter basically lets the user specify a column  
>>> separator and a spec for priority of columns.
>>>   The Comparator's *compare* function (implements  
>>> WritableComparable) then splits each sequence of data based on  
>>> user specified separator and then compares the 2 data streams on  
>>> the given priorities.
>>>
>>>   E.g. -sortColumnSpec 2,0,1 -separator \t
>>>   (0-based columns)
>>>
>>>   If there is enough interest, I can push this into mapred.lib.  
>>> Appreciate any suggestions.
>>>
>>> thanks,
>>> Arun
>>>
>>>
>>>> Design/Implement a tool to support archival and analysis of  
>>>> logfiles.
>>>> ------------------------------------------------------------------- 
>>>> --
>>>>
>>>>          Key: HADOOP-342
>>>>          URL: http://issues.apache.org/jira/browse/HADOOP-342
>>>>      Project: Hadoop
>>>>         Type: New Feature
>>>
>>>>     Reporter: Arun C Murthy
>>>
>>>>
>>>> Requirements:
>>>>   a) Create a tool support archival of logfiles (from diverse  
>>>> sources) in hadoop's dfs.
>>>>   b) The tool should also support analysis of the logfiles via  
>>>> grep/sort primitives. The tool should allow for fairly generic  
>>>> pattern 'grep's and let users 'sort' the matching lines (from  
>>>> grep) on 'columns' of their choice.
>>>>   E.g. from hadoop logs: Look for all log-lines with 'FATAL' and  
>>>> sort them based on timestamps (column x)  and then on column y  
>>>> (column x, followed by column y).
>>>> Design/Implementation:
>>>>   a) Log Archival
>>>>     Archival of logs from diverse sources can be accomplished  
>>>> using the *distcp* tool (HADOOP-341).
>>>>
>>>>   b) Log analysis
>>>>     The idea is to enable users of the tool to perform analysis  
>>>> of logs via grep/sort primitives.
>>>>     This can be accomplished via a relatively simple Map-Reduce  
>>>> task where the map does the *grep* for the given pattern via  
>>>> RegexMapper and then the implicit *sort* (reducer) is used with  
>>>> a custom Comparator which performs the user-specified  
>>>> comparision (columns).
>>>>     The sort/grep specs can be fairly powerful by letting the  
>>>> user of the tool use java's in-built regex patterns  
>>>> (java.util.regex).
>>>
>>> -- 
>>> This message is automatically generated by JIRA.
>>> -
>>> If you think it was sent incorrectly contact one of the  
>>> administrators:
>>>    http://issues.apache.org/jira/secure/Administrators.jspa
>>> -
>>> For more information on JIRA, see:
>>>    http://www.atlassian.com/software/jira
>>>
>>
>
>


Mime
View raw message