cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thunder Stumpges <thunder.stump...@gmail.com>
Subject Re: Help on Designing Cassandra table for my usecase
Date Thu, 09 Jan 2014 21:03:30 GMT
Well I think you have essentially time-series data, which C* should handle
well, however I think your "Tag" column is going to cause troubles. C* does
have collection columns, but they are not indexable nor usable in WHERE
clause. Your example has both the uniqueness of the data (primary key) and
query filtering on potentially multiple "Tag" columns. That is not
supported in C* AFAIK.If it were a single Tag, that could be a column that
is Indexed possibly.

Ignoring that issue with the many different Tags, You could model the table
as:

CREATE TABLE metric_data (
  metric text,
  time text,
  period text,
  tag text,
  value int,
  PRIMARY KEY( (metric,time), period, tag)
)

That would make a composite partitioning key on metric and time meaning
you'd always have to pass those (or else randomly page via TOKEN through
all rows). After specifying metric and time, you could optionally also
specify period and/or tag, and results would be ordered (clustered) by
period. This would satisfy your queries a,b, and d but not c (as you did
not specify time). If Time was a granularity column, does it even make
sense to return records across differing time values? What does it mean to
return the 4 month rows and 1 year row in your example? Could you issue N
queries in this case (where N is a small number of each of your time
granularities) ?

I'm not sure how close that gets you, or if you can re-work your concept of
Tag at all.
Good luck.
Thunder



On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kröger <hkroger@gmail.com> wrote:

> To my eye that looks something what the traditional analytics systems do.
> You can check out e.g. Acunu Analytics which uses Cassandra as a backend.
>
> Cheers,
> Hannu
>
>
> 2014/1/9 Naresh Yadav <nyadav.ait@gmail.com>
>
>> Hi all,
>>
>> I have a use case with huge data which i am not able to design in
>> cassandra.
>>
>> Table name : MetricResult
>>
>> Sample Data :
>>
>> Metric=Sales, Time=Month,  Period=Jan-10, Tag=U.S.A, Tag=Pen,     Value=10
>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil,  Value=20
>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen,     Value=30
>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil,  Value=10
>> Metric=Sales, Time=Month, Period=Feb-10, Tag=India,
>> Value=90
>> Metric=Sales, Time=Year, Period=2010,       Tag=U.S.A,
>> Value=70
>> Metric=Cost,  Time=Year, Period=2010,    Tag=CPU,
>> Value=8000
>> Metric=Cost,  Time=Year,  Period=2010,    Tag=RAM,
>> Value=4000
>> Metric=Cost,  Time=Year  Period=2011,     Tag=CPU,
>> Value=9000
>> Metric=Resource, Time=Week Period=Week1-2013,
>> Value=100
>>
>> So in above case i have case of
>>          TimeSeries data  i.e Time,Period column
>>          Dynamic columns i.e Tag column
>>          Indexing on dynamic columns i.e Tag column
>>          Aggregations SUM, AVERAGE
>>          Same value comes again for a Metric, Time, Period, Tag then
>> overwrite it
>>
>> Queries i need to support :
>> --------------------------------------
>> a)Give data for Metric=Sales AND Time=Month
>>        O/P : 5 rows
>> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10
>>        O/P : 2 rows
>> c)Give data for Metric=Sales AND Tag=U.S.A
>>        O/P : 5 rows
>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>        O/P :1 row
>>
>>
>> This table can have TB's of data and for a Metric,Period can have
>> millions of rows.
>>
>> Please give suggestion to design/model this table in Cassandra. If some
>> limitation in Cassandra then suggest best technology to handle this.
>>
>>
>> Thanks
>> Naresh
>>
>
>

Mime
View raw message