incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thushara Wijeratna <thu...@gmail.com>
Subject Re: alerts with chukwa
Date Mon, 26 Oct 2009 23:05:45 GMT
thanks Ariel.

SocketTeeWriter seems very useful. In testing this, I may have
found/fixed a minor bug as well.
inside SockteTeeWriter.Tee.setup(), the output steam should be created
early on in order for parser exceptions to be properly sent to the
client.
the patch is to simply move the creation of the output stream up like this:

        in = new BufferedReader(new InputStreamReader(sock.getInputStream()));
        out = new DataOutputStream(sock.getOutputStream());

i can submit a patch if you let me know the process.

thanks,
thushara

On Sun, Oct 25, 2009 at 10:38 PM, Ariel Rabkin <asrabkin@gmail.com> wrote:
> Ack. Sorry not to get back to you earlier.
>
> Yes, it's possible to do real-time monitoring with Chukwa. We do it
> here at Berkeley, and it seems to work well  It's not possible to do
> real-time analysis with MapReduce, however. And it's troublesome to do
> it on top of HDFS.  Those files in /chukwa/logs are not produced by MR
> jobs. The Chukwa collector is simply writing data to file, and closes
> the file every five minutes. Until a complete block is written, or the
> file is closed, concurrent reads won't see the data. So we close files
> every five minutes. That's unfortunately part of how HDFS works.
>
> If you want to scan the data in real-time, you should probably look at
> the PipelineStageWriter/PipelineableWriter classes.  Effectively,
> those let you write a filter that examines each chunk of logfile as it
> goes past in real time.  You can pull the monitoring out into a
> separate process, using the SocketTeeWriter -- documented here:
> http://www.cs.berkeley.edu/~asrabkin/chukwa/collector.html#SocketTeeWriter
>
> --Ari
>
> On Fri, Oct 23, 2009 at 1:20 PM, Thushara Wijeratna <thushw@gmail.com> wrote:
>> Hi,
>>
>> I want to develop a real-time alerting system. The alert condition can
>> be grepped from live logs. There are multiple machines that generate
>> logs, so I thought Chukwa might be suitable.
>> I did an install with 0.3.0 with an adaptor like this:
>>
>> add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
>> SysLog 0 /mpire/app/buyer/buyer.log 0
>>
>> When I look under HDFS, I see that the file was dumped there under
>> /chukwa/logs (identity M/R operation).
>> How do I intercept chukwa and do my own M/R, so that I can generate a
>> reduced log under /chukwa/logs? I could do this easily with Hadoop, if
>> I could find the HDFS directory to which chukwa collector dumps the
>> data from the chukwa agents. As that seems to be happening every 5
>> min, I should be able to run a cron job to do a M/R right after that.
>>
>> Is this a viable option, any suggestions?
>>
>> thanks,
>> thushara
>>
>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>

Mime
View raw message