incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Problem in chukwa output
Date Thu, 03 Jun 2010 22:02:34 GMT
FYI, the TsProcessor is not the default processor, so if you want to use it
you need to explicitly configure it to be used. If you have done that, then
the default time format of the TsProcessor is 'yyyy-MM-dd HH:mm:ss,SSS',
which is not what you have. If you process logs like you show with the
TsProcessor without overriding the default time format, you will get many
InError files as output.

Here's the code:

http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/java/org/apache/hadoop/chukwa/extraction/demux/processor/mapper/TsProcessor.java?view=markup

And here's how to configure the times format expected by the processor:
https://issues.apache.org/jira/browse/CHUKWA-472

And here's how to set the default processor to something other than what's
hardcoded (which is DefaultProcessor):
https://issues.apache.org/jira/browse/CHUKWA-473

On Thu, Jun 3, 2010 at 10:15 AM, Jerome Boulon <jboulon@netflix.com> wrote:

> The default TSProcessor expect that every record/line starts with a Date.
>
> The only thing that matter is the record delimiter. All current readers are
> using "\n" as a record delimiter.
> So for your specific case, is "\n" the right record delimiter?
> If yes, then, there's a bug in the reader, create a Jira for that.
> If "\n" is not a record delimiter then you have to write your own reader or
> change your log format to use "\n" as a record delimiter or escape the "\n"
> as we are doing in the log4j appender.
>
> /Jerome.
>
>
> On 6/3/10 12:14 AM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in>
> wrote:
>
> > Hi,
> >
> > I checked the new TsProcessor class but I don't think that I have to
> change
> > the date format as IM using standard SysLog types of log files.
> >
> > In my case, I am using TsProcessor. It is able to partially parse the log
> > files correctly and generate .evt files beneath the repos/ dir. However,
> there
> > is also an error directory and most of the data is going into that
> directory.
> > I am getting the date parse exception.
> >
> > I tried to find out why some of the data could be parsed and the
> remaining
> > could not be parsed. Then I found out that this is because the data is
> getting
> > divided into chunks as follows:
> >
> > Suppose the contents of the log file are as follows:
> >
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> >
> >
> > Chunk 1:
> >
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 |
> >
> >
> > Chunk 2:
> >
> > xargs -n 200 -r -0 rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> > May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> > /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
> /var/lib/php5/
> > -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0
> rm)
> >
> > There is no problem with the first chunk. It gets parsed properly and
> .evt
> > file is created.
> > But the second chunk starts with " xargs -n 200 -r -0 rm)" which is not a
> > valid date format. So the date parse exception is thrown.
> > So the problem is with the way data is getting divided into chunks.
> >
> > So is there any way to divide the chunks evenly? Any pointers in this
> case
> > would help.
> >
> > -----Original Message-----
> > From: Bill Graham [mailto:billgraham@gmail.com]
> > Sent: Tuesday, June 01, 2010 5:36 AM
> > To: chukwa-user@hadoop.apache.org
> > Cc: Jerome Boulon
> > Subject: Re: Problem in chukwa output
> >
> > The unparseable date errors are due to the map processor not being
> > able to properly extract the date from the record. Look at the
> > TsProcessor (on the trunk) and the latest demux configs for examples
> > of how to configure a processor for a given date format.
> >
> > I'm away from my computer now, but if you search for jiras asignex to
> > me, you should find the relevant  patches.
> >
> > On Friday, May 28, 2010, Stuti Awasthi <Stuti_Awasthi@persistent.co.in>
> wrote:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >> Sorry
> >> for replying late I was trying with what you have suggested.
> >>
> >> Yes
> >> it worked for me. Rotation factor increased my file size but now have
> other
> >> issue J
> >>
> >>
> >>
> >> @Issue
> >> :
> >>
> >>
> >>
> >> When
> >> chukwa demuxer  gets the log for the processing , it is getting
> distributed
> >> in 2 directories :
> >>
> >> 1)
> >> After
> >> correct processing , it generates .evt files.
> >>
> >> 2)
> >> Chuwa
> >> parser does not parse the data properly and end up giving ..InError
> >> directory.
> >>
> >>
> >>
> >> Rotation
> >> Time : 5 min to 1 Hour
> >>
> >>
> >>
> >> 1.
> >> SYSTEM LOGS
> >>
> >> Log File used : message1
> >>
> >> Datatype
> >> used : SysLog
> >>
> >>
> >>
> >> Error : java.text.ParseException:
> >> Unparseable date: "y  4 06:12:38 p"
> >>
> >>
> >>
> >> 2.
> >> Hadoop Logs
> >>
> >> Log
> >> File Used : Hadoop datanode logs , Hadoop TaskTracker logs
> >>
> >> Datatype Used : HadoopLog
> >>
> >>
> >>
> >> Error : java.text.ParseException:
> >> Unparseable date: "0 for block blk_1617125"
> >>
> >>
> >>
> >>
> >> 3.
> >> Chuwa Agent Logs
> >>
> >> Log
> >> File Used : Chuwa Agent logs
> >>
> >> Datatype
> >> Used : chuwaAgent
> >>
> >>
> >>
> >> Error : org.json.JSONException: A JSONObject text must begin with '{'
> >> at character 1 of post thread ChukwaHttpSender - collected 1 chunks
> >>
> >>
> >>
> >>
> >>
> >> I
> >> am wondering why data is getting into these INError directory. Is there
> any
> >> way
> >> we can get  correct evt files after demuxing rather than these
> INError.evt
> >> files.
> >>
> >>
> >>
> >> Thanks
> >>
> >> Stuti
> >>
> >>
> >>
> >>
> >>
> >> From: Jerome Boulon
> >> [mailto:jboulon@netflix.com]
> >> Sent: Thursday, May 27, 2010 1:01 AM
> >> To: chukwa-user@hadoop.apache.org
> >> Subject: Re: Problem in chukwa output
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Hi,
> >> The demux is grouping you data per date/hour/TimeWindow so yes, 1 .done
> file
> >> could be split into multiple .evt file depending on the
> content/timestamp of
> >> your data.
> >> Generally, if you have a SysLogInError directory, it's because the
> parser
> >> throws an exception and you should have some files in there.
> >>
> >> You may want to take a look at this wiki page to get an idea of Demux
> data
> >> flow.
> >> http://wiki.apache.org/hadoop/Chukwa_Processes_and_Data_Flow
> >>
> >> Regards,
> >> /Jerome.
> >>
> >> On 5/26/10 10:55 AM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in>
> >> wrote:
> >>
> >> Hi all,
> >> I am facing some problems in chukwa output.
> >>
> >> The following are the process flow in Collector :
> >> I worked with single .done file of 16MB in size for the analysis
> >>
> >> 1)     Logs were collected in /logs directory.
> >>
> >> 2)     After demux processing the output was stored in
> >> /repos directory.
> >>
> >> Following is the structure inside  repos:
> >>        /repos
> >>
>
> >>                    /SysLog
> >>                                     Total
> >> Size : 1MB
> >>
>
> >>                                    /20100503/
> >> *.evt
> >>
> >>
>
> >>                                    /20100504/*.evt
> >>
> >>
> >> /SysLogInError
> >>                        Total
> >> Size  : 15MB
> >>
>
> >>
>
> >>       /../*.evt
> >>
> >> I have 2 doubts :
> >>
> >>
> >> I noticed that my single log file was spilt into multiple  .evt
> >> file. My output file contained 2 folders inside / SysLog .Is this the
> correct
> >> behaviour that a single .done file is split into n number of .evt files
> and
> >> in
> >> different directory structure?
> >>
> >> There was a directory of SysLogInError generated but there was no ERROR
> in
> >> the
> >> log file. I was not sure when this directory gets created?
> >>
> >> Any pointers will be helpful.
> >> Thanks
> >> Stuti
> >> DISCLAIMER ========== This e-mail may contain privileged and
> confidential
> >> information which is the property of Persistent Systems Ltd. It is
> intended
> >> only for the use of the individual or entity to which it is addressed.
> If you
> >> are not the intended recipient, you are not authorized to read, retain,
> copy,
> >> print, distribute or use this message. If you have received this
> >> communication
> >> in error, please notify the sender and delete all copies of this
> message.
> >> Persistent Systems Ltd. does not accept any liability for virus infected
> >> mails.
> >>
> >>
> >>
> >> DISCLAIMER
> >> ==========
> >> This e-mail may contain privileged and confidential information which is
> the
> >> property of Persistent Systems Ltd. It is intended only for the use of
> the
> >> individual or entity to which it is addressed. If you are not the
> intended
> >> recipient, you are not authorized to read, retain, copy, print,
> distribute or
> >> use this message. If you have received this communication in error,
> please
> >> notify the sender and delete all copies of this message. Persistent
> Systems
> >> Ltd. does not accept any liability for virus infected mails.
> >>
> >>
> >>
> >>
> >
> > DISCLAIMER
> > ==========
> > This e-mail may contain privileged and confidential information which is
> the
> > property of Persistent Systems Ltd. It is intended only for the use of
> the
> > individual or entity to which it is addressed. If you are not the
> intended
> > recipient, you are not authorized to read, retain, copy, print,
> distribute or
> > use this message. If you have received this communication in error,
> please
> > notify the sender and delete all copies of this message. Persistent
> Systems
> > Ltd. does not accept any liability for virus infected mails.
> >
>
>

Mime
View raw message