incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohan L <l.mohan...@gmail.com>
Subject data model to store large volume syslog
Date Thu, 07 Mar 2013 12:10:26 GMT
Dear All,

I am looking Cassandra to store time series data(mostly syslog). The volume
of data is very huge and more entries happening at the same timestamps.
each record contain the following fields.

timestamps:host-name:facility:message

The below are the things needs to be monitored:


1). Need to get data between time X and Y
2). Need to get data between time X and Y for a host-name.
3). Need to search a 'pattern' in the message

the data model design which I am thinking is

1). create a column family 'cfrawlog' which stores raw log as received. row
key could be 'yyyyddmmhh'(new row is added for each hour or less), each
'column name' is uuid with 'value' is raw log data. Since we are also going
to use this log for forensics purpose, so it will help us to have all raw
log with in the column family without missing.

2). I want to create one more column family which is going to have the
parsed log so that we will use this column family to query. my question is
How to model this CF so that it will give answer of the above question?
what would be the row key for this CF?

3). Is the above data model makes sense?

Any help and suggestion would be greatly appreciated.


Thanks
Mohan L

Mime
View raw message