At 600 CF's I would expect to see very frequent flushing to disk, as the algorithm that drives this from a memory standpoint is sensitive to the number of CF's.
Additionally, and from experience on earlier versions, you can expect it to take over half an hour to make schema changes to over 500 CFs.
Finally, if you ever have performance problems it's a lot harder to diagnose on a system with 600 CF's that it is one with 60.
Hope that helps.
Co-Founder & Principal Consultant
Apache Cassandra Consulting
I don't know the full use case. However, for a generic time series scenario, we can make the timestamp (may be unto second part) part of the key and write all the data into the same CF(one CF for all data). Again, it may not make sense in your case, given the full use case. Just my 2 cents.
Thanks and Regards,
On Sep 26, 2013, at 11:18 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
600 is probably doable but each CF takes up memory……PlayOrm goes with a strategy that can virtualize CF's into one CF allowing less memory usage….we have 80,000 virtual CF's in cassandra through playorm….you can copy playorm's pattern if desired. But 600 is probably doable but high. 10,000 is not very doable.
But you would have to try out 600 to see if it works for you….it may not work…try and find out in your load and context.
NOTE: We have changed the 80,000 virtual CF's such that are in 10 real CF's these days so we get more parallel compaction going on.
From: Raihan Jamal <firstname.lastname@example.org<mailto:email@example.com>>
Reply-To: "firstname.lastname@example.org<mailto:email@example.com>" <firstname.lastname@example.org<mailto:email@example.com>>
Date: Thursday, September 26, 2013 11:39 AM
To: "firstname.lastname@example.org<mailto:email@example.com>" <firstname.lastname@example.org<mailto:email@example.com>>
Subject: How many Column Families can Cassandra handle?
I am working on a use case for Timeline series data. I have been told to create 600 column families in Cassandra. Meaning for 10 minutes, I will be having column families in Cassandra. Each second will have its own column family, so till 10 minutes which is 600 second, I will be having 600 column families...
In each second, we will write into that particular second column family.. so at 10 minutes (which is 600 second), we will write into 600 second column family..
I am wondering whether Cassandra will be able to handle 600 column families or not.. Right now, I am not sure how much data each column family will have... What I know so far is write will be coming at a rate of 20,000 writes per second...
Can anyone shed some light into this?