@Thunder

I just came to know about= (CASSANDRA-4511) which allows Index on Collec= tions and that will be part of release 2.1.

I hope in that case my problem will be solved by changing your design= ed table with tag column as set<text> and defining secondary index on= it. Is there any risk of performance problem of this design keeping in min= d huge data ???

Naresh

On Fri, Jan 10, 2014 at 10:26 AM, Naresh Yadav <nya= dav.ait@gmail.com> wrote:

@Thunder thanks f= or suggesting design but my main problem is indexing/quering dynamic Tag on= each row that is main context of each row and most of queries will include= that..

As an alternative to cassandra, i tried Apache Blur, in blur tabl= e i am able to store exact same data and all queries also worked..so blur= =A0 allows dynamic indexing=A0 of tag column BUT moving away from cassandra= , i am loosing its strength because of that i am not confident on this deci= sion as data will be huge in my case.

Please guide me on this with better suggestions.

Thanks
Naresh

On Fri, Jan 10, 2014 at 2:33 AM, Thunder Stumpges <thunder.stumpg= es@gmail.com> wrote:

Well I think you have essentially time-series data, which = C* should handle well, however I think your "Tag" column is going= to cause troubles. C* does have collection columns, but they are not index= able nor usable in WHERE clause. Your example has both the uniqueness of th= e data (primary key) and query filtering on potentially multiple "Tag&= quot; columns. That is not supported in C* AFAIK.If it were a single Tag, t= hat could be a column that is Indexed possibly.=A0

Ignoring that issue with the many different Tags, You could = model the table as:

CREATE TABLE metric_data (
=A0 metric = text,
=A0 time text,
=A0 period text,
=A0 tag= text,

=A0 value int,
=A0 PRIMARY KEY( (metric,time), period, tag)<= /div>
)

That would make a composite partitioni= ng key on metric and time meaning you'd always have to pass those (or e= lse randomly page via TOKEN through all rows). After specifying metric and = time, you could optionally also specify period and/or tag, and results woul= d be ordered (clustered) by period. This would satisfy your queries a,b, an= d d but not c (as you did not specify time). If Time was a granularity colu= mn, does it even make sense to return records across differing time values?= What does it mean to return the 4 month rows and 1 year row in your exampl= e? Could you issue N queries in this case (where N is a small number of eac= h of your time granularities) ?

I'm not sure how close that gets you, or if you can= re-work your concept of Tag at all.
Good luck.
Thunder

On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kr=F6ger <hkroger@gmail.com>= wrote:

To my eye that looks something what the traditional analyt= ics systems do. You can check out e.g. Acunu Analytics which uses Cassandra= as a backend.

Cheers,
Hannu

2014/1/9 Naresh Yadav = <nyadav.ait@gm= ail.com>

Hi all,

I have a use case with= huge data which i am not able to design in cassandra.

Ta= ble name : MetricResult=A0=A0=A0=A0=A0

Sample Data :

<= div>Metric=3DSales, Time=3DMonth,=A0 Period=3DJan-10, Tag=3DU.S.A, Tag=3DPe= n,=A0=A0=A0=A0 Value=3D10
Metric=3DSales, Time=3DMonth, Period=3DJan-10, Tag=3DU.S.A, Tag=3DPencil,= =A0 Value=3D20
Metric=3DSales, Time=3DMonth, Period=3DFeb-10, Tag=3DU.S.= A, Tag=3DPen,=A0=A0=A0=A0 Value=3D30
Metric=3DSales, Time=3DM= onth, Period=3DFeb-10, Tag=3DU.S.A, Tag=3DPencil,=A0 Value=3D10
Metric=3DSales, Time=3DMonth, Period=3DFeb-10, Tag=3DIndia, =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0=A0 Value=3D90
Metric=3DSales, Time=3DYear, Peri= od=3D2010, =A0 =A0 =A0 Tag=3DU.S.A, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 = Value=3D70
Metric=3DCost,=A0 Time=3DYear, Period=3D2010, =A0=A0 Tag=3DCP= U, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Value=3D8000
Metric=3DCost,=A0 Time=3DYear,=A0 Period=3D2010,=A0=A0=A0 Tag=3DRAM, =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 Value=3D4000
Metric=3DCost,=A0 Time= =3DYear=A0 Period=3D2011, =A0=A0=A0 Tag=3DCPU, =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0=A0 =A0=A0 Value=3D9000
Metric=3DResource, Time=3DWeek Period=3DWeek1= -2013, =A0=A0 =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 Value=3D100

So in above case i have case of
=A0=A0=A0=A0=A0=A0=A0=A0= TimeSeries data=A0 i.e Time,Period column
=A0=A0=A0=A0=A0=A0=A0=A0 Dyna= mic columns i.e Tag column
=A0=A0=A0=A0=A0=A0=A0=A0 Indexing = on dynamic columns i.e Tag column
=A0=A0=A0=A0=A0=A0=A0=A0 Aggregations SUM, AVERAGE
=A0=A0=A0= =A0=A0=A0=A0=A0 Same value comes again for a Metric, Time, Period, Tag then= overwrite it

Queries i need to support :
-----------= ---------------------------

a)Give data for Metric=3DSales AND Time=3DMonth
= =A0=A0=A0=A0=A0=A0 O/P : 5 rows
b)Give data for Metric= =3DSales AND Time=3DMonth AND Period=3DJan-10
=A0=A0=A0=A0=A0=A0 O/= P : 2 rows
c)Give data for Metric=3DSales AND Tag=3DU.S.A
=A0=A0=A0=A0=A0=A0 O/P : 5 rows
d)Give data for Metric=3DSale= s AND Period=3DJan-10 AND Tag=3DU.S.A AND Tag=3DPen
=A0= =A0=A0=A0=A0=A0 O/P :1 row

This table can have = TB's of data and for a Metric,Period can have millions of rows.

Please give suggestion to design/model this table= in Cassandra. If some limitation in Cassandra then suggest best technology= to handle this.

Thanks

Naresh