incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wright <>
Subject Re: CQL3 Data Model Question
Date Tue, 07 May 2013 20:42:16 GMT
So in that case I would create a different column family for each app id
and then a "time bucket" key as the row key with perhaps an hour
resolution?  Something like this:

create 123_table organic_events (
   hour timestamp,
   event_id UUID,
   app_id INT,
   event_time TIMESTAMP,
   user_id INT,
   PRIMARY KEY (hour, event_time, event_id)
)  WITH CLUSTERING ORDER BY (event_time desc);

Is this what others are doing?

On 5/7/13 4:18 PM, "Hiller, Dean" <> wrote:

>We use PlayOrm to do 60,000 different streams which are all time series
>and use the virtual column families of PlayOrm so they are all in one
>column family.  We then partition by time as well.  I don't believe that
>we really have any hotspots from what I can tell.
>From: Keith Wright <<>>
>Reply-To: "<>"
>Date: Tuesday, May 7, 2013 2:02 PM
>To: "<>"
>Subject: CQL3 Data Model Question
>Hi all,
>    I was hoping you could provide some assistance with a data modeling
>question (my apologies if a similar question has already been posed).  I
>have time based data that I need to store on a per customer (aka app id )
>basis so that I can easily return it in sorted order by event time.  The
>data in question is being written at high volume (~50K / sec) and I am
>concerned about the cardinality of using either app id or event time as
>the row key as either will likely result in hot spots.  Here are is the
>table definition I am considering:
>create table organic_events (
>event_id UUID,
>app_id INT,
>event_time TIMESTAMP,
>user_id INT,
>PRIMARY KEY (app_id, event_time, event_id)
>)  WITH CLUSTERING ORDER BY (app_id asc,event_time desc);
>So that I can be able to query as follows which will naturally sort the
>results by time descending:
>select * from organic_events where app_id = 1234 and event_time <=
>'2012-01-01' and event_time > '2012-01-01';
>Anyone have an idea of the best way to accomplish this?  I was
>considering the following:
> *   Making the row key a concatenation of app id and 0-100 using a mod
>on event id to get the value.  When getting data I would just fetch all
>keys given the mods (app_id in (1234_0,1234_1,1234_2, etc).  This would
>alleviate the "hot" key issue but still seems expensive and a little hacky
> *   I tried removing app_id from the primary key all together (using
>primary key of user_id, event_time, event_id) and making app_id a
>secondary index.  I would need to sort by time on the client.  The above
>query is valid however running a query is VERY slow as I believe it needs
>to fetch every row key that matches the index which is quite expensive (I
>get a timeout in cqlsh).
> *   Create a different column family for each app id (I.e.
>1234_organic_events).  Note that we could easily have 1000s of
>application ids.

View raw message