incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <>
Subject Re: Where to (physically) place the collector
Date Mon, 01 Mar 2010 20:53:57 GMT
I think best practice is actually to have the collector on the
datanode[s].   There's no particular reason to funnel fs writes
through the namenode, since traffic to the nn is very small compared
to the overall volume being written.

The collector is not only writing every five minutes. The collector is
writing continuously. However, the filesystem doesn't promise that
data will be visible until a block boundary, which we impose every
five minutes at least by closing files.


On Mon, Mar 1, 2010 at 6:43 AM, Oded Rosen <> wrote:
> I've been searching the docs but could find no help --
> We have some machines that produce data - and on each we have
> an adapter (agent). Those machines are 'close' to each other - same network
> (physically).
> Then, we have the HDFS cluster on other machines, on another network. The
> two networks are of course connected (via internet).
> So, we want to know which is better - network-wise: to put the collector on
> the same network of the adapters, or on the same computer as the hdfs
> namenode?
> Option A - collector close to adapters - seems better to me because they
> send data ALL THE TIME to the collector, while the collector sends data to
> the hdfs only every 5 mins, with one writing action.
> P.S - our collector writes exactly what he gets from the adapters, so there
> are no considerations regarding data volumes.
> Any recommendations?
> Thanks,
> --
> Oded

Ari Rabkin
UC Berkeley Computer Science Department

View raw message