incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorvaldsson Justus <justus.thorvalds...@svenskaspel.se>
Subject SV: Using Cassandra for storing measurement data
Date Tue, 03 Aug 2010 07:37:27 GMT
It sounds to me that it's an good idea to use Cassandra in your case, I figure I help you as
we Europeans need to cooperate some even though I only worked with Cassandra for a month.
=)

1:
What is the query you want to use when charting the data? Use it to decide how to storage
and sort your data.
2:
Where is your row? You must model it correctly, I added my explanation here: http://x0613.orbbox.com/blog/662/8567/
(http://www.justus.st/)
SCF-ROW-SC-C
Or
CF-ROW-C
3:
There is some limitations:
2GB of data in a row in 0.6, 2 billion columns in 0.7.
And
A row must fit on a node.
4:
For my range-selections - I think I need the OrderPreservingPartitioner. Right?
I don't think you must but sort it by the time of measure. Why you do not need to is because
you always have an entire row on the same node, OrderPreservingPartitioner is regarding Row
Keys in order.
You got to check how to sort columns and supercolumns again. I haven't added my bookmarks
to the blog yet but http://www.sodeso.nl/?p=421
Was a good source for information I think. There is more on the same blog aswell.
5:
There is always alternate designs, you should not give up to early as it's the most important
decisions.
6:
Have a nice day Stefan

/Justus



-----Ursprungligt meddelande-----
Från: Stefan Kaufmann [mailto:staeff@gmail.com] 
Skickat: den 3 augusti 2010 09:21
Till: user@cassandra.apache.org
Ämne: Using Cassandra for storing measurement data

Dear Cassandra Users,

I'm quite new to Cassandra and I'm still trying to figure out, if I'm
on the right path for my requirements.
I like to explain my Cassandra design and hope to receive feedback, if
this would work.

I like to use Cassandra to store measurement data from several
devices. Each device every minute - so there will be about 500 000
Entries per device every year.
Following data has to be stored:
 - device ID
 - measurement Time (of course different to the Cassandra time-stamp)
 - measurement value

Later, the data should be charted - so I need to select time-ranges
from a device.



My solution for is currently a super-column:
{
    name: "device1",
    value: {
        // measurement timestamps..
        1280819205: {name: "value", value: "10", timestamp: 123456789},
        1280819305: {name: "value", value: "15", timestamp: 123456789},
        1280819405: {name: "value", value: "10", timestamp: 123456789},
        //there will be millions of entries
    }
    name: "device2",
    value: {
        // measurement timestamps..
        1280819205: {name: "value", value: "20", timestamp: 123456789},
        1280819305: {name: "value", value: "15", timestamp: 123456789},
        1280819405: {name: "value", value: "20", timestamp: 123456789},
         //there will be millions of entries
    }
}

My questions:
My main concern is the huge amount of subcolumns I'm using. All the
examples of Cassandra in the web I saw, used those to store only a few
columns (like a user profile).
So would this work with millions of entries?

For my range-selections - I think I need the OrderPreservingPartitioner. Right?

Are there alternative designs? Maybe one without a Super-column? I
can't think of one..

I'm looking forward to some answers,
Thanks in advance,
Stefan

Mime
View raw message