incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: multiple threads/HttpConnector from ChukwaAgent
Date Thu, 26 Jan 2012 21:27:10 GMT

We use chukwa for log aggregation of web servers and it powers our analytics pipeline.  It's
been super useful and solid but we are running into a bit of a problem.  I was hoping to split
my data stream and create a realtime pipeline w/hbase but also stream into HDFS for bach MR
processing still.  

I am running some simple calculations on pageviews coming in and wanted to update hbase using
counters.  This is slow right now since I only really have 1 servlet processing my chunk in
my demo environment.  Without the realtime hbase counters in the pipeline data flows a couple
order of magnitudes quicker--I was hoping that smaller chunks lots more collector servlets
I could make it scale better but right now it slows down the data stream too much. 

We use only 3 collectors in production and they handle the traffic well... but adding more
would give us more concurrent hbase writer capability, was hoping there was a knob to allow
for more concurrent chunk writing.

On Jan 26, 2012, at 1:03 PM, Eric Yang wrote:

> Hi Corbin,
> This is by design.  We are concatenating all data streams into in
> memory queue on the agent, and establish only one http connection to
> collector.  This is for horizontal scalability that we can support
> more machines.  At the same time, it also ensures that agent can write
> more data per HTTP post to reduce overhead of HTTP headers and
> connection handshakes.
> regards,
> Eric
> On Thu, Jan 26, 2012 at 11:51 AM, Corbin Hoenes <> wrote:
>> I am trying to do some real-time processing of the data coming into my chukwa pipeline
and notice that using a single agent I don't seem to be getting very many servlets handling
the requests. Peeking at the ChukwaAgent code it looks like the agents are limited to a single
>> Is this by design or am I off-base in my analysis of how it works?

View raw message