incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: CQL3 Data Model Question
Date Tue, 07 May 2013 20:18:24 GMT
We use PlayOrm to do 60,000 different streams which are all time series and use the virtual
column families of PlayOrm so they are all in one column family.  We then partition by time
as well.  I don't believe that we really have any hotspots from what I can tell.

Dean

From: Keith Wright <kwright@nanigans.com<mailto:kwright@nanigans.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, May 7, 2013 2:02 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: CQL3 Data Model Question

Hi all,

    I was hoping you could provide some assistance with a data modeling question (my apologies
if a similar question has already been posed).  I have time based data that I need to store
on a per customer (aka app id ) basis so that I can easily return it in sorted order by event
time.  The data in question is being written at high volume (~50K / sec) and I am concerned
about the cardinality of using either app id or event time as the row key as either will likely
result in hot spots.  Here are is the table definition I am considering:

create table organic_events (
event_id UUID,
app_id INT,
event_time TIMESTAMP,
user_id INT,
….
PRIMARY KEY (app_id, event_time, event_id)
)  WITH CLUSTERING ORDER BY (app_id asc,event_time desc);

So that I can be able to query as follows which will naturally sort the results by time descending:

select * from organic_events where app_id = 1234 and event_time <= '2012-01-01' and event_time
> '2012-01-01';

Anyone have an idea of the best way to accomplish this?  I was considering the following:

 *   Making the row key a concatenation of app id and 0-100 using a mod on event id to get
the value.  When getting data I would just fetch all keys given the mods (app_id in (1234_0,1234_1,1234_2,
etc).  This would alleviate the "hot" key issue but still seems expensive and a little hacky
 *   I tried removing app_id from the primary key all together (using primary key of user_id,
event_time, event_id) and making app_id a secondary index.  I would need to sort by time on
the client.  The above query is valid however running a query is VERY slow as I believe it
needs to fetch every row key that matches the index which is quite expensive (I get a timeout
in cqlsh).
 *   Create a different column family for each app id (I.e. 1234_organic_events).  Note that
we could easily have 1000s of application ids.

Thanks!

Mime
View raw message