Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD0FBF260 for ; Tue, 7 May 2013 20:52:19 +0000 (UTC) Received: (qmail 43179 invoked by uid 500); 7 May 2013 20:52:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43148 invoked by uid 500); 7 May 2013 20:52:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43138 invoked by uid 99); 7 May 2013 20:52:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 20:52:17 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.174.58.134] (HELO XEDGEA.nrel.gov) (192.174.58.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 20:52:13 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEA.nrel.gov (192.174.58.134) with Microsoft SMTP Server (TLS) id 8.3.298.1; Tue, 7 May 2013 14:51:40 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Tue, 7 May 2013 14:51:51 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Tue, 7 May 2013 14:51:52 -0600 Subject: Re: CQL3 Data Model Question Thread-Topic: CQL3 Data Model Question Thread-Index: Ac5LZLJgsIOmn3XUT/Ko1NAUVmua3A== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.2.130206 acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Playorm is not yet on CQL3 and cassandra doesn't work well with +10,000 CF's as we went down that path and cassandra can't cope, so we have one cassandra CF with 60,000 virtual CF's thanks to PlayOrm and a few other CF's. But yes, we bucket into hour or month or whatever depending on your rates and have an exact timestamp as well. That is one option. You can virtualize without playorm by just prefixing the rowkey with the device id each time and reversing that on reads of course. I am not sure if you need to partition by time or not after that as that dependson number of rows per device. Later, Dean On 5/7/13 2:42 PM, "Keith Wright" wrote: >So in that case I would create a different column family for each app id >and then a "time bucket" key as the row key with perhaps an hour >resolution? Something like this: > >create 123_table organic_events ( > hour timestamp, > event_id UUID, > app_id INT, > event_time TIMESTAMP, > user_id INT, > =A9. > PRIMARY KEY (hour, event_time, event_id) >) WITH CLUSTERING ORDER BY (event_time desc); > > > >Is this what others are doing? > > >On 5/7/13 4:18 PM, "Hiller, Dean" wrote: > >>We use PlayOrm to do 60,000 different streams which are all time series >>and use the virtual column families of PlayOrm so they are all in one >>column family. We then partition by time as well. I don't believe that >>we really have any hotspots from what I can tell. >> >>Dean >> >>From: Keith Wright > >>Reply-To: "user@cassandra.apache.org" >>> >>Date: Tuesday, May 7, 2013 2:02 PM >>To: "user@cassandra.apache.org" >>> >>Subject: CQL3 Data Model Question >> >>Hi all, >> >> I was hoping you could provide some assistance with a data modeling >>question (my apologies if a similar question has already been posed). I >>have time based data that I need to store on a per customer (aka app id ) >>basis so that I can easily return it in sorted order by event time. The >>data in question is being written at high volume (~50K / sec) and I am >>concerned about the cardinality of using either app id or event time as >>the row key as either will likely result in hot spots. Here are is the >>table definition I am considering: >> >>create table organic_events ( >>event_id UUID, >>app_id INT, >>event_time TIMESTAMP, >>user_id INT, >>=A9. >>PRIMARY KEY (app_id, event_time, event_id) >>) WITH CLUSTERING ORDER BY (app_id asc,event_time desc); >> >>So that I can be able to query as follows which will naturally sort the >>results by time descending: >> >>select * from organic_events where app_id =3D 1234 and event_time <=3D >>'2012-01-01' and event_time > '2012-01-01'; >> >>Anyone have an idea of the best way to accomplish this? I was >>considering the following: >> >> * Making the row key a concatenation of app id and 0-100 using a mod >>on event id to get the value. When getting data I would just fetch all >>keys given the mods (app_id in (1234_0,1234_1,1234_2, etc). This would >>alleviate the "hot" key issue but still seems expensive and a little >>hacky >> * I tried removing app_id from the primary key all together (using >>primary key of user_id, event_time, event_id) and making app_id a >>secondary index. I would need to sort by time on the client. The above >>query is valid however running a query is VERY slow as I believe it needs >>to fetch every row key that matches the index which is quite expensive (I >>get a timeout in cqlsh). >> * Create a different column family for each app id (I.e. >>1234_organic_events). Note that we could easily have 1000s of >>application ids. >> >>Thanks! >