cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate Sammons <NSamm...@ften.com>
Subject Best way to store and index time series items with multiple other dimensions?
Date Thu, 16 Feb 2012 19:23:25 GMT
I'm trying to figure out the best way to store items for query based on multiple dimensions.
 I've got a large volume (many 100s of millions per day) of time-ordered objects with 10+
properties each that I need to support arbitrary query expressions on.  So I may need to support
a query based on a segment of time plus an expression like "A == 'foo' and B == 'bar' and
C == 'baz'" etc...  Any pointers?

For simple time-ordered retrieval I was going to have a set of time buckets used as row keys,
something like YYYY-MM-DD-HH, and with an extra character or two to reduce hotspots (probably
take a hash of the object and use the first byte of the hash in hex), so a row key might look
like:

   2012-02-16-09:a

This way I'm spreading writes for that hour across 16 rows.  Then the column name would be
a TimeUUID or some time-based value, and the column value would be the object.  This lets
me easily slice out segments of time, and lets me write data really well.

However if I need to satisfy a query for items matching some expression during the day, I
have to scan a *lot* of records.  I can require some property to always be present in the
query, and I can base the above extra byte in the row key, so when I scan records I can cut
down the number of row keys read by a factor of 16, but that's still a huge amount of data
to just scan through.

One obvious choice here are secondary indexes, but that implies "short" rows that can't be
time sliced as easily, and I don't know that have a bunch of secondary indexes will scale
very well (or support range queries).


Any ideas on a way to structure data for easy queries like this?


Thanks,

-nate



Nate Sammons | Sr. Technical Specialist | FTEN, A NASDAQ OMX Company
Office: +1.720.889.5141 | Email: nsammons@ften.com<mailto:nsammons@ften.com>
Aggregation.  Transparency.  Control. (tm)  | www.FTEN.com<http://ften.com/>


Mime
View raw message