cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiguo Zhang <mikewolfx...@gmail.com>
Subject Re: Time-series data model
Date Wed, 14 Apr 2010 13:18:51 GMT
first of all I am a new bee by Non-SQL. I try write down my opinions as
references:

If I were you, I will use 2 columnfamilys:

1.CF,  key is devices
2.CF,  key is timeuuid

how do u think about that?

Mike


On Wed, Apr 14, 2010 at 3:02 PM, Jean-Pierre Bergamin <james@ractive.ch>wrote:

> Hello everyone
>
> We are currently evaluating a new DB system (replacing MySQL) to store
> massive amounts of time-series data. The data are various metrics from
> various network and IT devices and systems. Metrics i.e. could be CPU usage
> of the server "xy" in percent, memory usage of server "xy" in MB, ping
> response time of server "foo" in milliseconds, network traffic of router
> "bar" in MB/s and so on. Different metrics can be collected for different
> devices in different intervals.
>
> The metrics are stored together with a timestamp. The queries we want to
> perform are:
>  * The last value of a specific metric of a device
>  * The values of a specific metric of a device between two timestamps t1
> and
> t2
>
> I stumbled across this blog post which describes a very similar setup with
> Cassandra:
> https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
> This post gave me confidence that what we want is definitively doable with
> Cassandra.
>
> But since I'm just digging into columns and super-columns and their
> families, I still have some problems understanding everything.
>
> Our data model could look in json'isch notation like this:
> {
> "my_server_1": {
>        "cpu_usage": {
>                {ts: 1271248215, value: 87 },
>                {ts: 1271248220, value: 34 },
>                {ts: 1271248225, value: 23 },
>                {ts: 1271248230, value: 49 }
>        }
>        "ping_response": {
>                {ts: 1271248201, value: 0.345 },
>                {ts: 1271248211, value: 0.423 },
>                {ts: 1271248221, value: 0.311 },
>                {ts: 1271248232, value: 0.582 }
>        }
> }
>
> "my_server_2": {
>        "cpu_usage": {
>                {ts: 1271248215, value: 23 },
>                ...
>        }
>        "disk_usage": {
>                {ts: 1271243451, value: 123445 },
>                ...
>        }
> }
>
> "my_router_1": {
>        "bytes_in": {
>                {ts: 1271243451, value: 2452346 },
>                ...
>        }
>        "bytes_out": {
>                {ts: 1271243451, value: 13468 },
>                ...
>        }
>        "errors": {
>                {ts: 1271243451, value: 24 },
>                ...
>        }
> }
> }
>
> What I don't get is how to created the two level hierarchy
> [device][metric].
>
> Am I right that the devices would be kept in a super column family? The
> ordering of those is not important.
>
> But the metrics per device are also a super column, where the columns would
> be the metric values ({ts: 1271243451, value: 24 }), isn't it?
>
> So I'd need a super column in a super column... Hm.
> My brain is definitively RDBMS-damaged and I don't see through columns and
> super-columns yet. :-)
>
> How could this be modeled in Cassandra?
>
>
> Thank you very much
> James
>
>
>

Mime
View raw message