incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <>
Subject Re: who transfers the data?
Date Tue, 05 Jan 2010 20:07:14 GMT
There is one connection from the agent to a collector, that carries
all the data coming from that agent.

A collector can saturate the single-writer bandwidth of HDFS -- call
it 20-40 MB/sec.  If you have a single agent writing a large fraction
of that, say more than 4 MB/sec, then Chukwa is probably not solving
your problem anyway.

On Tue, Jan 5, 2010 at 2:51 PM, Corbin Hoenes <> wrote:
> It's a little unclear to me who is transferring the chunks to the
> collectors.  Does each adaptor have a connection or does the agent have a
> single connection to the collector?   For example if I have 10 log files
> that I am tailing (an adaptor for each) do they all go to the same collector
> or does it distribute those to any one of the collectors I have listed in my
> collectors file?
> "Rather than have each adaptor write directly to HDFS, data is sent across
> the network to a collector process, that does the HDFS writes. Each
> collector receives data from up to several hundred hosts, and writes all
> this data to a single sink file, which is a Hadoop sequence file of
> serialized Chunks. Periodically, collectors close their sink files, rename
> them to mark them available for processing, and resume writing a new file.
> Data is sent to collectors over HTTP."
> Corbin Hoenes
> skype: choenes

Ari Rabkin
UC Berkeley Computer Science Department

View raw message