hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Inder Pall <inder.p...@gmail.com>
Subject Re: Large-scale collection of logs from multiple Hadoop nodes
Date Tue, 06 Aug 2013 04:54:22 GMT
We have been using a flume like system for such usecases at significantly
large scale and it has been working quite well.

Would like to hear thoughts/challenges around using zeromq alike systems at
good enough scale.

"you are the average of 5 people you spend the most time with"
On Aug 5, 2013 11:29 PM, "Public Network Services" <
publicnetworkservices@gmail.com> wrote:

> Hi...
> I am facing a large-scale usage scenario of log collection from a Hadoop
> cluster and examining ways as to how it should be implemented.
> More specifically, imagine a cluster that has hundreds of nodes, each of
> which constantly produces Syslog events that need to be gathered an
> analyzed at another point. The total amount of logs could be tens of
> gigabytes per day, if not more, and the reception rate in the order of
> thousands of events per second, if not more.
> One solution is to send those events over the network (e.g., using using
> flume) and collect them in one or more (less than 5) nodes in the cluster,
> or in another location, whereby the logs will be processed by a either
> constantly MapReduce job, or by non-Hadoop servers running some log
> processing application.
> Another approach could be to deposit all these events into a queuing
> system like ActiveMQ or RabbitMQ, or whatever.
> In all cases, the main objective is to be able to do real-time log
> analysis.
> What would be the best way of implementing the above scenario?
> Thanks!

View raw message