incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <Stuti_Awas...@persistent.co.in>
Subject RE: Problem in chukwa output
Date Mon, 07 Jun 2010 08:45:02 GMT
Hi ,

Sorry  for some wrong information.

The log files which I am using is of type SysLog and the parser class which is being used
is org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SysLog.

I am not using TsProcessor.java for parsing the log files. I checked that AbstactProcessor.java
 takes "\n" as a record delimiter.

And the date format of my logs are agree with the date format of SysLog. So the question of
wrong date format is ruled out.

The problem is still the same that my data is getting splitted from between and because of
that the date parse exception is coming.

 Chunk 1:

Jun 4 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
Jun 4 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
Jun 4 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxli

Chunk 2:

fetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
Jun 4 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
Jun 4 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)

Stuti


From: Bill Graham [mailto:billgraham@gmail.com]
Sent: Friday, June 04, 2010 3:33 AM
To: chukwa-user@hadoop.apache.org
Subject: Re: Problem in chukwa output

FYI, the TsProcessor is not the default processor, so if you want to use it you need to explicitly
configure it to be used. If you have done that, then the default time format of the TsProcessor
is 'yyyy-MM-dd HH:mm:ss,SSS', which is not what you have. If you process logs like you show
with the TsProcessor without overriding the default time format, you will get many InError
files as output.

Here's the code:

http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/java/org/apache/hadoop/chukwa/extraction/demux/processor/mapper/TsProcessor.java?view=markup

And here's how to configure the times format expected by the processor:
https://issues.apache.org/jira/browse/CHUKWA-472

And here's how to set the default processor to something other than what's hardcoded (which
is DefaultProcessor):
https://issues.apache.org/jira/browse/CHUKWA-473
On Thu, Jun 3, 2010 at 10:15 AM, Jerome Boulon <jboulon@netflix.com<mailto:jboulon@netflix.com>>
wrote:
The default TSProcessor expect that every record/line starts with a Date.

The only thing that matter is the record delimiter. All current readers are
using "\n" as a record delimiter.
So for your specific case, is "\n" the right record delimiter?
If yes, then, there's a bug in the reader, create a Jira for that.
If "\n" is not a record delimiter then you have to write your own reader or
change your log format to use "\n" as a record delimiter or escape the "\n"
as we are doing in the log4j appender.

/Jerome.


On 6/3/10 12:14 AM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in<mailto:Stuti_Awasthi@persistent.co.in>>
wrote:

> Hi,
>
> I checked the new TsProcessor class but I don't think that I have to change
> the date format as IM using standard SysLog types of log files.
>
> In my case, I am using TsProcessor. It is able to partially parse the log
> files correctly and generate .evt files beneath the repos/ dir. However, there
> is also an error directory and most of the data is going into that directory.
> I am getting the date parse exception.
>
> I tried to find out why some of the data could be parsed and the remaining
> could not be parsed. Then I found out that this is because the data is getting
> divided into chunks as follows:
>
> Suppose the contents of the log file are as follows:
>
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
>
>
> Chunk 1:
>
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 |
>
>
> Chunk 2:
>
> xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
>
> There is no problem with the first chunk. It gets parsed properly and .evt
> file is created.
> But the second chunk starts with " xargs -n 200 -r -0 rm)" which is not a
> valid date format. So the date parse exception is thrown.
> So the problem is with the way data is getting divided into chunks.
>
> So is there any way to divide the chunks evenly? Any pointers in this case
> would help.
>
> -----Original Message-----
> From: Bill Graham [mailto:billgraham@gmail.com<mailto:billgraham@gmail.com>]
> Sent: Tuesday, June 01, 2010 5:36 AM
> To: chukwa-user@hadoop.apache.org<mailto:chukwa-user@hadoop.apache.org>
> Cc: Jerome Boulon
> Subject: Re: Problem in chukwa output
>
> The unparseable date errors are due to the map processor not being
> able to properly extract the date from the record. Look at the
> TsProcessor (on the trunk) and the latest demux configs for examples
> of how to configure a processor for a given date format.
>
> I'm away from my computer now, but if you search for jiras asignex to
> me, you should find the relevant  patches.
>
> On Friday, May 28, 2010, Stuti Awasthi <Stuti_Awasthi@persistent.co.in<mailto:Stuti_Awasthi@persistent.co.in>>
wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Hi,
>>
>>
>>
>> Sorry
>> for replying late I was trying with what you have suggested.
>>
>> Yes
>> it worked for me. Rotation factor increased my file size but now have other
>> issue J
>>
>>
>>
>> @Issue
>> :
>>
>>
>>
>> When
>> chukwa demuxer  gets the log for the processing , it is getting distributed
>> in 2 directories :
>>
>> 1)
>> After
>> correct processing , it generates .evt files.
>>
>> 2)
>> Chuwa
>> parser does not parse the data properly and end up giving ..InError
>> directory.
>>
>>
>>
>> Rotation
>> Time : 5 min to 1 Hour
>>
>>
>>
>> 1.
>> SYSTEM LOGS
>>
>> Log File used : message1
>>
>> Datatype
>> used : SysLog
>>
>>
>>
>> Error : java.text.ParseException:
>> Unparseable date: "y  4 06:12:38 p"
>>
>>
>>
>> 2.
>> Hadoop Logs
>>
>> Log
>> File Used : Hadoop datanode logs , Hadoop TaskTracker logs
>>
>> Datatype Used : HadoopLog
>>
>>
>>
>> Error : java.text.ParseException:
>> Unparseable date: "0 for block blk_1617125"
>>
>>
>>
>>
>> 3.
>> Chuwa Agent Logs
>>
>> Log
>> File Used : Chuwa Agent logs
>>
>> Datatype
>> Used : chuwaAgent
>>
>>
>>
>> Error : org.json.JSONException: A JSONObject text must begin with '{'
>> at character 1 of post thread ChukwaHttpSender - collected 1 chunks
>>
>>
>>
>>
>>
>> I
>> am wondering why data is getting into these INError directory. Is there any
>> way
>> we can get  correct evt files after demuxing rather than these INError.evt
>> files.
>>
>>
>>
>> Thanks
>>
>> Stuti
>>
>>
>>
>>
>>
>> From: Jerome Boulon
>> [mailto:jboulon@netflix.com<mailto:jboulon@netflix.com>]
>> Sent: Thursday, May 27, 2010 1:01 AM
>> To: chukwa-user@hadoop.apache.org<mailto:chukwa-user@hadoop.apache.org>
>> Subject: Re: Problem in chukwa output
>>
>>
>>
>>
>>
>>
>>
>> Hi,
>> The demux is grouping you data per date/hour/TimeWindow so yes, 1 .done file
>> could be split into multiple .evt file depending on the content/timestamp of
>> your data.
>> Generally, if you have a SysLogInError directory, it's because the parser
>> throws an exception and you should have some files in there.
>>
>> You may want to take a look at this wiki page to get an idea of Demux data
>> flow.
>> http://wiki.apache.org/hadoop/Chukwa_Processes_and_Data_Flow
>>
>> Regards,
>> /Jerome.
>>
>> On 5/26/10 10:55 AM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in<mailto:Stuti_Awasthi@persistent.co.in>>
>> wrote:
>>
>> Hi all,
>> I am facing some problems in chukwa output.
>>
>> The following are the process flow in Collector :
>> I worked with single .done file of 16MB in size for the analysis
>>
>> 1)     Logs were collected in /logs directory.
>>
>> 2)     After demux processing the output was stored in
>> /repos directory.
>>
>> Following is the structure inside  repos:
>>        /repos
>>
>>                    /SysLog
>>                                     Total
>> Size : 1MB
>>
>>                                    /20100503/
>> *.evt
>>
>>
>>                                    /20100504/*.evt
>>
>>
>> /SysLogInError
>>                        Total
>> Size  : 15MB
>>
>>
>>       /../*.evt
>>
>> I have 2 doubts :
>>
>>
>> I noticed that my single log file was spilt into multiple  .evt
>> file. My output file contained 2 folders inside / SysLog .Is this the correct
>> behaviour that a single .done file is split into n number of .evt files and
>> in
>> different directory structure?
>>
>> There was a directory of SysLogInError generated but there was no ERROR in
>> the
>> log file. I was not sure when this directory gets created?
>>
>> Any pointers will be helpful.
>> Thanks
>> Stuti
>> DISCLAIMER ========== This e-mail may contain privileged and confidential
>> information which is the property of Persistent Systems Ltd. It is intended
>> only for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, you are not authorized to read, retain, copy,
>> print, distribute or use this message. If you have received this
>> communication
>> in error, please notify the sender and delete all copies of this message.
>> Persistent Systems Ltd. does not accept any liability for virus infected
>> mails.
>>
>>
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the
>> property of Persistent Systems Ltd. It is intended only for the use of the
>> individual or entity to which it is addressed. If you are not the intended
>> recipient, you are not authorized to read, retain, copy, print, distribute or
>> use this message. If you have received this communication in error, please
>> notify the sender and delete all copies of this message. Persistent Systems
>> Ltd. does not accept any liability for virus infected mails.
>>
>>
>>
>>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute or
> use this message. If you have received this communication in error, please
> notify the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
>


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message