incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Holstad <erikhols...@gmail.com>
Subject Re: Best way to store millisecond-accurate data
Date Sat, 24 Apr 2010 00:58:55 GMT
On Fri, Apr 23, 2010 at 5:54 PM, Miguel Verde <miguelitovert@gmail.com>wrote:

> TimeUUID's time component is measured in 100-nanosecond intervals. The
> library you use might calculate it with poorer accuracy or precision, but
> from a storage/comparison standpoint in Cassandra millisecond data is easily
> captured by it.
>
> One typical way of dealing with the data explosion of sampled time series
> data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure) so that you
> put an upper bound on the row length.
>
>
> On Apr 23, 2010, at 7:01 PM, Andrew Nguyen <
> andrew-lists-cassandra@ucsfcti.org> wrote:
>
>  Hello,
>>
>> I am looking to store patient physiologic data in Cassandra - it's being
>> collected at rates of 1 to 125 Hz.  I'm thinking of storing the timestamps
>> as the column names and the patient/parameter combo as the row key.  For
>> example, Bob is in the ICU and is currently having his blood pressure,
>> intracranial pressure, and heart rate monitored.  I'd like to collect this
>> with the following row keys:
>>
>> Bob-bloodpressure
>> Bob-intracranialpressure
>> Bob-heartrate
>>
>> The column names would be timestamps but that's where my questions start:
>>
>> I'm not sure what the best data type and CompareWith would be.  From my
>> searching, it sounds like the TimeUUID may be suitable but isn't really
>> designed for millisecond accuracy.  My other thought is just to store them
>> as strings (2010-04-23 10:23:45.016).  While I space isn't the foremost
>> concern, we will be collecting this data 24/7 so we'll be creating many
>> columns over the long-term.
>>
> You could just get an 8 byte millisecond timestamp and store that as a part
of the key

>
>> I found https://issues.apache.org/jira/browse/CASSANDRA-16 which states
>> that the entire row must fit in memory.  Does this include the values as
>> well as the column names?
>>
> Yes. The option is to store one insert per row, you are not going to be
able to do backwards slices this way,  without extra index, but you can
scale mush better.

>
>> In considering the limits of cassandra and the best way to model this, we
>> would be adding 3.9 billion rows per year (assuming 125 Hz @ 24/7).
>>  However, I can't really think of a better way to model this...  So, am I
>> thinking about this all wrong or am I on the right track?
>>
>> Thanks,
>> Andrew
>>
>


-- 
Regards Erik

Mime
View raw message