incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: Problem in ChukwaRecord file contents
Date Wed, 09 Jun 2010 21:42:57 GMT
Yes, CharFileTailingAdaptorUTF8 is designed to do one line per record.
If it does more than that, it's a bug.

--Ari

On Wed, Jun 9, 2010 at 8:00 AM, Corbin Hoenes <corbin@tynt.com> wrote:
> Gerrit -
> It's my understanding that the CharFileTailingAdaptorUTF8 sends only 1 line
> per record.  Why can't Stuti just use this?
> On Jun 8, 2010, at 7:55 AM, Gerrit Jansen van Vuuren wrote:
>
> Have a look at:
>
>
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.LWFTAdaptor
>
> protected int extractRecords(ChunkReceiver eq, long buffOffsetInFile,
>       byte[] buf) throws InterruptedException {
>
>
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
> protected int extractRecords(ChunkReceiver eq, long buffOffsetInFile,
>       byte[] buf) throws InterruptedException {
>
> If you need one line per record you’d have to write your own adaptor. Maybe
> subclass CharFileTailingAdaptorUTF8
>
>
>
> From: Stuti Awasthi [mailto:Stuti_Awasthi@persistent.co.in]
> Sent: Tuesday, June 08, 2010 1:31 PM
> To: chukwa-user@hadoop.apache.org; Gerrit van Vuuren
> Subject: RE: Problem in ChukwaRecord file contents
>
> So is  that means that we will always have several lines of log data in the
> <body> tag of chukwa record?
>  Can you please tell me where is that agent code that defines this.
>
> I have read these ChukwaRecord through Map Reduce and can read the original
> log lines. J
>
> Stuti
>
> From: Gerrit Jansen van Vuuren [mailto:gvanvuuren@specificmedia.com]
> Sent: Tuesday, June 08, 2010 5:53 PM
> To: chukwa-user@hadoop.apache.org
> Subject: RE: Problem in ChukwaRecord file contents
>
> Each chukwa record will contain several lines of log data (depending on how
> the agent defines lines J ).
>
> You can use the MapReduce Jobs, HDFS or Pig to read these files.  You might
> need to do some coding though.
>
> I use pig to read to chukwa files and then to get the original log lines I
> output the data column (i.e. these original records) using a pig BinStorage.
>
> Have a look at
>  com.specificmedia.hadoop.logimport.demux.chukwa.ChukwaArchive() and the
> other chukwa-core classes.
>
> Hope this helps.
>
>
> Cheers,
>
>
>
> From: Stuti Awasthi [mailto:Stuti_Awasthi@persistent.co.in]
> Sent: Tuesday, June 08, 2010 12:51 PM
> To: chukwa-user@hadoop.apache.org
> Subject: Problem in ChukwaRecord file contents
>
> Hi All,
>
> I gave my log file as input to chukwa and converted it to .evt file i.e.
> ChukwaRecord file
> I checked the ChukwaRecord file which is a sequence file with
> ChukwaRecordKey and ChukwaRecord.
> I saw that the ChukwaRecord contains Timestamp and some other fields. One of
> them is the “body” field.
> However, this body field of each record contains a bunch of lines (from my
> original log file).
>
> Contents of Original log file:
>
> May 29 13:00:16 ps3156 syslogd 1.5.0#5ubuntu3: restart.
> May 29 13:00:16 ps3156 anacron[4148]: Job `cron.daily' terminated
> May 29 13:00:16 ps3156 anacron[4148]: Normal exit (1 job run)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
>
>
> Contents of ChukwaRecord file:
>
> {"DataType": "SysLog", "Key":
> "1275118200000/ps3156.persistent.co.in/1275118216000", "Timestamp":
> 1275118216000, "mapFields": {"csource": "ps3156.persistent.co.in", "capp":
> "/home/hadoop/Test/syslog_test", "ctags": " cluster="chukwa"", "body": "May
> 29 13:00:16 ps3156 syslogd 1.5.0#5ubuntu3: restart.
> May 29 13:00:16 ps3156 anacron[4148]: Job `cron.daily' terminated
> May 29 13:00:16 ps3156 anacron[4148]: Normal exit (1 job run)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x
> /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/
> -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
> } }
>
> I can see that each record in the ChukwaRecord file contains a chunk of
> lines from the original log file. Is this behavior correct?
> According to my understanding, each record in the ChukwaRecord file should
> contain only one line from the original log file.
> Is it possible to create such a ChukwaRecord file?
> Please suggest.
>
>
> Stuti
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message