Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3E1310B53 for ; Wed, 26 Feb 2014 00:52:14 +0000 (UTC) Received: (qmail 31122 invoked by uid 500); 26 Feb 2014 00:52:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 31071 invoked by uid 500); 26 Feb 2014 00:52:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 31063 invoked by uid 99); 26 Feb 2014 00:52:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 00:52:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jlacefield@datastax.com designates 209.85.128.169 as permitted sender) Received: from [209.85.128.169] (HELO mail-ve0-f169.google.com) (209.85.128.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 00:52:06 +0000 Received: by mail-ve0-f169.google.com with SMTP id c14so1435574vea.28 for ; Tue, 25 Feb 2014 16:51:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:from:mime-version:in-reply-to:date :message-id:subject:to:content-type:content-transfer-encoding; bh=AcWIfrO1NxLVF901fQuU9oadCnPfgtxVn59Q69cW2F8=; b=TIQLcQ96Y3Y1/6Cgpbgqm3y7Hk2H8E1BJ8KUA/JSqpmVyhXBlSKf0ExhBPqxpwAEml QITvZ/TDag6+ZN1+Ks0AfhCw20QYZGh/v5g5jKF3885Ljy6DqLH7m9wf/iBirlQiqAQP WbOcQHl2TEf3Jhe/WQZtQM2DP/WDnd44JmGnW5iHHk/nuItcWlUdhRgu45NXq53vSYD5 j0kX6AYBfnb6TIxRI1whBoKDnyRnTkaifTQgh1Lut1Ufh+MXdvcUZAKOZDTzD1pyiJ1C 8u1861V/8Gbotwa/tMdahbTK2efcQgVvPA2IWmGDx16pfyP9YrABzXnZ6GtVbxpialSq iSTg== X-Gm-Message-State: ALoCoQlfEtb6MC0JYdJSEhuT8Zw3wdr6GXL/KOkQjowpQEZxEcHCipXwNqTwmIwWKUusYyaqSiFD X-Received: by 10.221.55.133 with SMTP id vy5mr3463836vcb.17.1393375904935; Tue, 25 Feb 2014 16:51:44 -0800 (PST) References: From: Jonathan Lacefield Mime-Version: 1.0 (1.0) In-Reply-To: Date: Tue, 25 Feb 2014 19:51:45 -0500 Message-ID: <4219401695378628553@unknownmsgid> Subject: Re: Getting the most-recent version from time-series data To: "user@cassandra.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Clint One approach would be to create a copy of this table and switch the clustering columns around so version precedes family. This way you could easily grab the 1st, 2nd, N version rows. Would this help you in your situation? Jonathan > On Feb 25, 2014, at 7:49 PM, Clint Kelly wrote: > > Hi everyone, > > Let's say that I have a table that looks like the following: > > CREATE TABLE time_series_stuff ( > key text, > family text, > version int, > val text, > PRIMARY KEY (key, family, version) > ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND > bloom_filter_fp_chance=3D0.010000 AND > caching=3D'KEYS_ONLY' AND > comment=3D'' AND > dclocal_read_repair_chance=3D0.000000 AND > gc_grace_seconds=3D864000 AND > index_interval=3D128 AND > read_repair_chance=3D0.100000 AND > replicate_on_write=3D'true' AND > populate_io_cache_on_flush=3D'false' AND > default_time_to_live=3D0 AND > speculative_retry=3D'99.0PERCENTILE' AND > memtable_flush_period_in_ms=3D0 AND > compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND > compression=3D{'sstable_compression': 'LZ4Compressor'}; > > cqlsh:fiddle> select * from time_series_stuff ; > > key | family | version | val > --------+---------+---------+-------- > monday | revenue | 3 | $$$$$$ > monday | revenue | 2 | $$$ > monday | revenue | 1 | $$ > monday | revenue | 0 | $ > monday | traffic | 2 | medium > monday | traffic | 1 | light > monday | traffic | 0 | heavy > > (7 rows) > > Now let's say that I'd like to perform a query that gets me the most rece= nt N versions of "revenue" and "traffic." > > Is there a CQL query to do this? Let's say that N=3D1. Then I know that= I can do: > > cqlsh:fiddle> select * from time_series_stuff where key=3D'monday' and fa= mily=3D'revenue' limit 1; > > key | family | version | val > --------+---------+---------+-------- > monday | revenue | 3 | $$$$$$ > > (1 rows) > > cqlsh:fiddle> select * from time_series_stuff where key=3D'monday' and fa= mily=3D'traffic' limit 1; > > key | family | version | val > --------+---------+---------+-------- > monday | traffic | 2 | medium > > (1 rows) > > But what if I have lots of "families" and I want to get the most recent N= versions of all of them in a single CQL statement. Is that possible? Unf= ortunately I am working on something where the family names and the number = of most-recent versions are not known a priori (I am porting some code that= was designed for HBase). > > Best regards, > Clint