incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Seeking suggestions for a use case
Date Tue, 12 Feb 2013 22:41:54 GMT
We are open sourcing the system so I don't mind at all.

We are using patterns from this web page
https://github.com/deanhiller/playorm/wiki/Patterns-Page

Realize PlayOrm is doing a huge amount of heavy lifting for us with it's virtual tables and
partioning.  This will be hard to explain but I will give it a shot.

We have one columnfamily called "data".  We have 60,000 virtual tables in that CF (PlayOrm
prefixes every key with the table name so short table names for our data is good).  So thus
far, we simply have

Data
rowKey = CompositeKey(virtual tablename, time since epoch)

Next, we haven't done this just yet, but we are going to partition each virtual table.  In
playorm, our partition is not like cassandra so one partition is spread across the cluster
much like a virtual table), and for this we just add a special column, the partitioned column(in
PlayOrm, we just annotate a field and it partitions it for us)

So we have

ColumnFamily="Data"
rowKey = CompositeKey(virtual tablename, time since epoch)
ColumnName="PartitionTimeKey" / ColumnValue= (value of time at beginning of the month)
….(the rest is names of the columns with data and their data)

As you can see since we only deal in looking up rowKeys at this point, we are completely scalable
to infinity and beyond.  Now, behind the scenes PlayOrm is creating some indexes for us in
that a partition can scale up to < 10 million rows.  Let's say we have 100,000 rows in
the above model where all rows are in virtualTable=deansTemperature AND let's say those same
rows are in all the same partition of the month of February.  There is a single wide row(created
by playorm, not me directly) like so

ColumnFamily="IntegerIndex"
rowKey=Composite(<virtual tablename>, "PartitionTimeKey", <begin of February time>)
column1Name=Composite(<time1>, <rowKeyToData98>)
column2Name=Composite(<time2>, <rowKeyToData56>)
……This is a very wide row WITH NO VALUES…..all information is in the column names!!!!

The "PartitionTimeKey" is not necessary but playOrm allows me to partition in different directions
so if I did multiple partition types, it would be needed but we don't use that.

I hope that makes sense.  I never sure whether I am being clear enough or not as I don't know
how much noSQL you know…….if you are just getting started, you need to read up on the
Composite column names pattern and wide rows.  The link above is general noSQL patterns mixed
from a playorm point of view but it tries to explaint he underlying noSQL pattern each time.

Thanks,
Dean




From: Boris Solovyov <boris.solovyov@gmail.com<mailto:boris.solovyov@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, February 12, 2013 2:56 PM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Seeking suggestions for a use case

Would you mind sharing your schema on the list? It would be useful to see how you modeled
your data. Or you could email me privately if you want.

Thanks
Boris


On Tue, Feb 12, 2013 at 4:11 PM, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
Yes, the limit of the width of a row is approximately in the millions, perhaps lower than
10 million.  We plan to go well above that in our use case ;).  Our widest row for indexing
right now is only around 200,000 columns and we have been in production one month(At 10 years
that would be about 24 million).  They want 10 years of data at the very least and they constantly
have researchers working with the data sets from all times.


Mime
View raw message