cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Francis <>
Subject High Log Storage
Date Thu, 19 Apr 2012 17:04:19 GMT
I have a web application that generates multiple log files in a log file directory. On a particularly
chatty box, up to 2000 entries per second are written to those log files. We are looking for
a solution to tail that directory and insert new entries into a cassandra db. 

The fields in the log file are pipe delimited, but we can delimit the data points using any
delimiter. We would want to structure the data such that each data point would get its own
column when its inserted into Cassandra. 

We setup Flume to handle this, but the cassandra sink isn't robust enough to handle even one
chatty machine. We may have up to 200 machines.

Any suggestions on a tool that can reliably do this. Data not making it into the cassandra
db will cause huge problems, so that is a factor to consider.


Trevor Francis

View raw message