cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohan L <>
Subject Re: data model to store large volume syslog
Date Wed, 13 Mar 2013 11:23:30 GMT
On Fri, Mar 8, 2013 at 9:42 PM, aaron morton <>wrote:

> > 1). create a column family 'cfrawlog' which stores raw log as received.
> row key could be 'yyyyddmmhh'(new row is added for each hour or less), each
> 'column name' is uuid with 'value' is raw log data. Since we are also going
> to use this log for forensics purpose, so it will help us to have all raw
> log with in the column family without missing.
> As Moshe said there is a chance of hot spotting if you are sending all
> writes to a certain row.
> You also need to consider how big the row will get, in general stay below
> about 30MB. You can go higher but there are some implications.
> > 2). I want to create one more column family which is going to have the
> parsed log so that we will use this column family to query. my question is
> How to model this CF so that it will give answer of the above question?
> what would be the row key for this CF?
> Something like:
> row_key: YYYYMMDD
> column: <host:timestamp:>
> Note, i've not considered how to handle duplicate time stamps from the
> same host

I have created a standard column family with:

row_key : <YYYYMMDDHH:hostname>
Column_Name  : <timestamp:hostname>
Column_Value (as JSON dump) : {"date": "2013-03-05 06:21:56", "hostname": "", "error_message": "Starting checkpoint of DB.db at Tue Mar 05
2013 06:21"}

I have two question in the above model:

1). If the column_name is same for the given row_key then Cassandra will
update the column_value. Is there any way in to append the value in the
same column(say first time do insert and next time do append)? Does it
make sense my question?

2). Is there any way I can search/filter based on column_value? If not
possible,  what is the work around way to achieve this king of column_value
based search/filter in Cassandra?

say for example : The below query return subrange of the columns in a row.
It will return all value between the range.  what will be the way to filter
subrange output bases on their column_value?

key = ''
result = col_fam.get(key,column_start='2013-03-05',

Any help and suggestion would be greatly appreciated.

Mohan L

View raw message