incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Lacefield <jlacefi...@datastax.com>
Subject Re: Getting the most-recent version from time-series data
Date Wed, 26 Feb 2014 00:51:45 GMT
Clint

   One approach would be to create a copy of this table and switch the
clustering columns around so version precedes family.  This way you
could easily grab the 1st, 2nd, N version rows.  Would this help you
in your situation?

Jonathan

> On Feb 25, 2014, at 7:49 PM, Clint Kelly <clint.kelly@gmail.com> wrote:
>
> Hi everyone,
>
> Let's say that I have a table that looks like the following:
>
> CREATE TABLE time_series_stuff (
>   key text,
>   family text,
>   version int,
>   val text,
>   PRIMARY KEY (key, family, version)
> ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
> cqlsh:fiddle> select * from time_series_stuff ;
>
>  key    | family  | version | val
> --------+---------+---------+--------
>  monday | revenue |       3 | $$$$$$
>  monday | revenue |       2 |    $$$
>  monday | revenue |       1 |     $$
>  monday | revenue |       0 |      $
>  monday | traffic |       2 | medium
>  monday | traffic |       1 |  light
>  monday | traffic |       0 |  heavy
>
> (7 rows)
>
> Now let's say that I'd like to perform a query that gets me the most recent N versions
of "revenue" and "traffic."
>
> Is there a CQL query to do this?  Let's say that N=1.  Then I know that I can do:
>
> cqlsh:fiddle> select * from time_series_stuff where key='monday' and family='revenue'
limit 1;
>
>  key    | family  | version | val
> --------+---------+---------+--------
>  monday | revenue |       3 | $$$$$$
>
> (1 rows)
>
> cqlsh:fiddle> select * from time_series_stuff where key='monday' and family='traffic'
limit 1;
>
>  key    | family  | version | val
> --------+---------+---------+--------
>  monday | traffic |       2 | medium
>
> (1 rows)
>
> But what if I have lots of "families" and I want to get the most recent N versions of
all of them in a single CQL statement.  Is that possible?  Unfortunately I am working on something
where the family names and the number of most-recent versions are not known a priori (I am
porting some code that was designed for HBase).
>
> Best regards,
> Clint

Mime
View raw message