incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Verde <miguelitov...@gmail.com>
Subject Re: Best way to store millisecond-accurate data
Date Sat, 24 Apr 2010 00:54:54 GMT
TimeUUID's time component is measured in 100-nanosecond intervals. The  
library you use might calculate it with poorer accuracy or precision,  
but from a storage/comparison standpoint in Cassandra millisecond data  
is easily captured by it.

One typical way of dealing with the data explosion of sampled time  
series data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure)  
so that you put an upper bound on the row length.

On Apr 23, 2010, at 7:01 PM, Andrew Nguyen <andrew-lists-cassandra@ucsfcti.org 
 > wrote:

> Hello,
>
> I am looking to store patient physiologic data in Cassandra - it's  
> being collected at rates of 1 to 125 Hz.  I'm thinking of storing  
> the timestamps as the column names and the patient/parameter combo  
> as the row key.  For example, Bob is in the ICU and is currently  
> having his blood pressure, intracranial pressure, and heart rate  
> monitored.  I'd like to collect this with the following row keys:
>
> Bob-bloodpressure
> Bob-intracranialpressure
> Bob-heartrate
>
> The column names would be timestamps but that's where my questions  
> start:
>
> I'm not sure what the best data type and CompareWith would be.  From  
> my searching, it sounds like the TimeUUID may be suitable but isn't  
> really designed for millisecond accuracy.  My other thought is just  
> to store them as strings (2010-04-23 10:23:45.016).  While I space  
> isn't the foremost concern, we will be collecting this data 24/7 so  
> we'll be creating many columns over the long-term.
>
> I found https://issues.apache.org/jira/browse/CASSANDRA-16 which  
> states that the entire row must fit in memory.  Does this include  
> the values as well as the column names?
>
> In considering the limits of cassandra and the best way to model  
> this, we would be adding 3.9 billion rows per year (assuming 125 Hz  
> @ 24/7).  However, I can't really think of a better way to model  
> this...  So, am I thinking about this all wrong or am I on the right  
> track?
>
> Thanks,
> Andrew

Mime
View raw message