incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Kaufmann <sta...@gmail.com>
Subject Re: SV: Using Cassandra for storing measurement data
Date Tue, 03 Aug 2010 11:17:55 GMT
Thank you very much for those impulses. I see, that I'm still thinking
to much in RDMS.
I'll try thoses out.

Stefan

On Tue, Aug 3, 2010 at 11:44 AM, Aaron Morton <aaron@thelastpickle.com> wrote:
> As Justus said, you need to consider the way you want to get the data back
> and then denormalise to suit. Do you need to support ad-hoc queries or will
> you know how you want to query ahead of time?
> Some different approaches may be
> Standard CF to hold the measurements taken, grouped by day
> {
> device_id/20100810 : { date_and_time : value,
>                                   date_and_time : value
>                                }
> }
> - this spreads the write for each device around the cluster, but the same
> nodes are used for every write for one device.
> - you can read all the measurements for one device for one day in one get
> Super CF to hold all the measures for a day, with super columns for the
> device
> {
> 20100810 : {
>     device_id {
>         date_and_time : value
>     }
> }
> - this concentrates the write load for a single day on the same nodes for
> all devices.
> - may not be practicable if you have a lot of devices
> - you can read all the measurements for all devices for a single day in one
> get
> Standard CF to store each measurement as a row by itself.
> {
> device/date_and_time : {
>     "timestamp" : date_and_time,
>     "measurement" : "the value"
>     }
> }
> - this spreads every write around the cluster for every device and day
> - You can then also write the values into aggregate CF's, say grouped by day
> or device as above. If you ever want to build new aggregates you can use the
> raw data in this CF.
> Try out some different ideas and see how easy it is to do your reporting.
>
> This post from Cloud Kick may help
> https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
> Aaron
> On 03 Aug, 2010,at 07:37 PM, Thorvaldsson Justus
> <justus.thorvaldsson@svenskaspel.se> wrote:
>
> It sounds to me that it's an good idea to use Cassandra in your case, I
> figure I help you as we Europeans need to cooperate some even though I only
> worked with Cassandra for a month. =)
>
> 1:
> What is the query you want to use when charting the data? Use it to decide
> how to storage and sort your data.
> 2:
> Where is your row? You must model it correctly, I added my explanation here:
> http://x0613.orbbox.com/blog/662/8567/ (http://www.justus.st/)
> SCF-ROW-SC-C
> Or
> CF-ROW-C
> 3:
> There is some limitations:
> 2GB of data in a row in 0.6, 2 billion columns in 0.7.
> And
> A row must fit on a node.
> 4:
> For my range-selections - I think I need the OrderPreservingPartitioner.
> Right?
> I don't think you must but sort it by the time of measure. Why you do not
> need to is because you always have an entire row on the same node,
> OrderPreservingPartitioner is regarding Row Keys in order.
> You got to check how to sort columns and supercolumns again. I haven't added
> my bookmarks to the blog yet but http://www.sodeso.nl/?p=421
> Was a good source for information I think. There is more on the same blog
> aswell.
> 5:
> There is always alternate designs, you should not give up to early as it's
> the most important decisions.
> 6:
> Have a nice day Stefan
>
> /Justus
>
>
>
> -----Ursprungligt meddelande-----
> Från: Stefan Kaufmann [mailto:staeff@gmail.com]
> Skickat: den 3 augusti 2010 09:21
> Till: user@cassandra.apache.org
> Ämne: Using Cassandra for storing measurement data
>
> Dear Cassandra Users,
>
> I'm quite new to Cassandra and I'm still trying to figure out, if I'm
> on the right path for my requirements.
> I like to explain my Cassandra design and hope to receive feedback, if
> this would work.
>
> I like to use Cassandra to store measurement data from several
> devices. Each device every minute - so there will be about 500 000
> Entries per device every year.
> Following data has to be stored:
> - device ID
> - measurement Time (of course different to the Cassandra time-stamp)
> - measurement value
>
> Later, the data should be charted - so I need to select time-ranges
> from a device.
>
>
>
> My solution for is currently a super-column:
> {
> name: "device1",
> value: {
> // measurement timestamps..
> 1280819205: {name: "value", value: "10", timestamp: 123456789},
> 1280819305: {name: "value", value: "15", timestamp: 123456789},
> 1280819405: {name: "value", value: "10", timestamp: 123456789},
> //there will be millions of entries
> }
> name: "device2",
> value: {
> // measurement timestamps..
> 1280819205: {name: "value", value: "20", timestamp: 123456789},
> 1280819305: {name: "value", value: "15", timestamp: 123456789},
> 1280819405: {name: "value", value: "20", timestamp: 123456789},
> //there will be millions of entries
> }
> }
>
> My questions:
> My main concern is the huge amount of subcolumns I'm using. All the
> examples of Cassandra in the web I saw, used those to store only a few
> columns (like a user profile).
> So would this work with millions of entries?
>
> For my range-selections - I think I need the OrderPreservingPartitioner.
> Right?
>
> Are there alternative designs? Maybe one without a Super-column? I
> can't think of one..
>
> I'm looking forward to some answers,
> Thanks in advance,
> Stefan
>

Mime
View raw message