cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Timeseries analysis using Cassandra and partition by date period
Date Sat, 04 Apr 2015 12:41:21 GMT
It depends on the actual number of events per user, but simply bucketing
the partition key can give you the same effect - clustering rows by time
range. A composite partition key could be comprised of the user name and
the date.

It also depends on the data rate - is it many events per day or just a few
events per week, or over what time period. You need to be careful - you
don't want your Cassandra partitions to be too big (millions of rows) or
too small (just a few or even one row per partition.)

-- Jack Krupansky

On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak <serega.sheypak@gmail.com>
wrote:

> Hi, I switched from HBase to Cassandra and try to find problem solution
> for timeseries analysis on top Cassandra.
> I have a entity named "Event".
> "Event" has attributes:
> user_id - a guy who triggered event
> event_ts - when even happened
> event_type - type of event
> some_other_attr - some other attrs we don't care about right now.
>
> The DDL for entity event looks this way:
>
> CREATE TABLE user_plans (
>
>   id timeuuid,
>   user_id timeuuid,
>   event_ts timestamp,
>   event_type int,
>   some_other_attr text
>
> PRIMARY KEY (user_id, ends)
> );
>
> Table is "infinite", It would grow continuously during application
> lifetime.
> I want to ask question:
> Cassandra, give me all event where event_ts >= xxx and event_ts <=yyy.
>
> Right now it would lead to full table scan.
>
> There is a trick in HBase. HBase has table abstraction and HBase has
> Column Family abstraction.
> Column family should be declared in advance.
> Column family - physically is a pack of HFiles ("SSTables in C*").
> So I can easily add partitioning for my HBase table:
> alter table hbase_events add column familiy '2015_01'
> and store all 2015 January data to Column familiy named '2015_01'.
>
> When I want to get January data, I would directly access column family
> named '2015_01' and I won't massage all data in table, just this piece.
>
> What is approach in C* in this case?
> I have an idea create several tables: event_2015_01, event_2015_02, e.t.c.
> but it looks rather ugly from my current understanding how it works.
>
>
>

Mime
View raw message