incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Kelly <clint.ke...@gmail.com>
Subject Getting the most-recent version from time-series data
Date Wed, 26 Feb 2014 00:48:38 GMT
Hi everyone,

Let's say that I have a table that looks like the following:

CREATE TABLE time_series_stuff (
  key text,
  family text,
  version int,
  val text,
  PRIMARY KEY (key, family, version)
) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

cqlsh:fiddle> select * from time_series_stuff ;

 key    | family  | version | val
--------+---------+---------+--------
 monday | revenue |       3 | $$$$$$
 monday | revenue |       2 |    $$$
 monday | revenue |       1 |     $$
 monday | revenue |       0 |      $
 monday | traffic |       2 | medium
 monday | traffic |       1 |  light
 monday | traffic |       0 |  heavy

(7 rows)

Now let's say that I'd like to perform a query that gets me the most recent
N versions of "revenue" and "traffic."

Is there a CQL query to do this?  Let's say that N=1.  Then I know that I
can do:

cqlsh:fiddle> select * from time_series_stuff where key='monday' and
family='revenue' limit 1;

 key    | family  | version | val
--------+---------+---------+--------
 monday | revenue |       3 | $$$$$$

(1 rows)

cqlsh:fiddle> select * from time_series_stuff where key='monday' and
family='traffic' limit 1;

 key    | family  | version | val
--------+---------+---------+--------
 monday | traffic |       2 | medium

(1 rows)

But what if I have lots of "families" and I want to get the most recent N
versions of all of them in a single CQL statement.  Is that possible?
Unfortunately I am working on something where the family names and the
number of most-recent versions are not known a priori (I am porting some
code that was designed for HBase).

Best regards,
Clint

Mime
View raw message