incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <cor...@tynt.com>
Subject Re: Problem in ChukwaRecord file contents
Date Wed, 09 Jun 2010 15:00:33 GMT
Gerrit -

It's my understanding that the CharFileTailingAdaptorUTF8 sends only 1 line per record.  Why
can't Stuti just use this?

On Jun 8, 2010, at 7:55 AM, Gerrit Jansen van Vuuren wrote:

> Have a look at:
>  
>  
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.LWFTAdaptor
>  
> protected int extractRecords(ChunkReceiver eq, long buffOffsetInFile,
>       byte[] buf) throws InterruptedException {
>  
>  
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
> protected int extractRecords(ChunkReceiver eq, long buffOffsetInFile,
>       byte[] buf) throws InterruptedException {
>  
> If you need one line per record you’d have to write your own adaptor. Maybe subclass
CharFileTailingAdaptorUTF8
>  
>  
>  
> From: Stuti Awasthi [mailto:Stuti_Awasthi@persistent.co.in] 
> Sent: Tuesday, June 08, 2010 1:31 PM
> To: chukwa-user@hadoop.apache.org; Gerrit van Vuuren
> Subject: RE: Problem in ChukwaRecord file contents
>  
> So is  that means that we will always have several lines of log data in the <body>
tag of chukwa record?
>  Can you please tell me where is that agent code that defines this.
>  
> I have read these ChukwaRecord through Map Reduce and can read the original log lines.
J
>  
> Stuti
>  
> From: Gerrit Jansen van Vuuren [mailto:gvanvuuren@specificmedia.com] 
> Sent: Tuesday, June 08, 2010 5:53 PM
> To: chukwa-user@hadoop.apache.org
> Subject: RE: Problem in ChukwaRecord file contents
>  
> Each chukwa record will contain several lines of log data (depending on how the agent
defines lines J ).
>  
> You can use the MapReduce Jobs, HDFS or Pig to read these files.  You might need to do
some coding though.
>  
> I use pig to read to chukwa files and then to get the original log lines I output the
data column (i.e. these original records) using a pig BinStorage.
>  
> Have a look at  com.specificmedia.hadoop.logimport.demux.chukwa.ChukwaArchive() and the
other chukwa-core classes.
>  
> Hope this helps.
>  
>  
> Cheers,
>  
>  
>  
> From: Stuti Awasthi [mailto:Stuti_Awasthi@persistent.co.in] 
> Sent: Tuesday, June 08, 2010 12:51 PM
> To: chukwa-user@hadoop.apache.org
> Subject: Problem in ChukwaRecord file contents
>  
> Hi All,
>  
> I gave my log file as input to chukwa and converted it to .evt file i.e. ChukwaRecord
file
> I checked the ChukwaRecord file which is a sequence file with ChukwaRecordKey and ChukwaRecord.
> I saw that the ChukwaRecord contains Timestamp and some other fields. One of them is
the “body” field.
> However, this body field of each record contains a bunch of lines (from my original log
file).
>  
> Contents of Original log file:
>  
> May 29 13:00:16 ps3156 syslogd 1.5.0#5ubuntu3: restart.
> May 29 13:00:16 ps3156 anacron[4148]: Job `cron.daily' terminated
> May 29 13:00:16 ps3156 anacron[4148]: Normal exit (1 job run)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
>  
>  
> Contents of ChukwaRecord file:
>  
> {"DataType": "SysLog", "Key": "1275118200000/ps3156.persistent.co.in/1275118216000",
"Timestamp": 1275118216000, "mapFields": {"csource": "ps3156.persistent.co.in", "capp": "/home/hadoop/Test/syslog_test",
"ctags": " cluster="chukwa"", "body": "May 29 13:00:16 ps3156 syslogd 1.5.0#5ubuntu3: restart.
> May 29 13:00:16 ps3156 anacron[4148]: Job `cron.daily' terminated
> May 29 13:00:16 ps3156 anacron[4148]: Normal exit (1 job run)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm)
> May 29 13:09:02 ps3156 /USR/SBIN/CRON[19815]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime
] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime)
-print0 | xargs -n 200 -r -0 rm) } }
>  
> I can see that each record in the ChukwaRecord file contains a chunk of lines from the
original log file. Is this behavior correct?
> According to my understanding, each record in the ChukwaRecord file should contain only
one line from the original log file.
> Is it possible to create such a ChukwaRecord file?
> Please suggest.
>  
>  
> Stuti
> DISCLAIMER ========== This e-mail may contain privileged and confidential information
which is the property of Persistent Systems Ltd. It is intended only for the use of the individual
or entity to which it is addressed. If you are not the intended recipient, you are not authorized
to read, retain, copy, print, distribute or use this message. If you have received this communication
in error, please notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.
> 
> DISCLAIMER ========== This e-mail may contain privileged and confidential information
which is the property of Persistent Systems Ltd. It is intended only for the use of the individual
or entity to which it is addressed. If you are not the intended recipient, you are not authorized
to read, retain, copy, print, distribute or use this message. If you have received this communication
in error, please notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.
> 


Mime
View raw message