Of course, I am new to the Cassandra world, so it is taking some getting used to understand how everything translates into my MYSQL head.
We are building an enterprise application that will ingest log inf ormation and provide metrics and trending based upon the data contained in the logs. The application is transactional in nature such that a record will be written to a log and our system will need to query that record and assign two values to it in addition to using the information to develop trending metrics.
The logs are being fed into cassandra by Flume.
Each of our users will be assigned their own piece of hardware that generates these log events, some of which can peak at up to 2500 transactions per second for a couple of hours. The log entries are around 150-bytes each and contain around 20 different pieces of information. Neither us, nor our users are interested in generating any queries across the entire database. Users are only concerned with the data that their particular piece of hardware generates.
Should I just setup a single column family with 20 columns, the first of which bei ng the row key and make the row key the username of that user?
We would also need probably 2 more columns to store Value A and Value B assigned to that particular record.
Our metrics will be be something like this: For this particular user, during this particular timeframe, what is the average of field "X?" And then store that value, which we can generate historical trending over the course a week. We will do this every 15 minutes.
Any suggestions on where I should head to start my journey into Cassandra for my particular application?
On Apr 18, 2012, at 2:14 PM, Janne Jalkanen wrote: