cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: 10,000s of column families/keyspaces
Date Mon, 01 Jul 2013 16:48:25 GMT
Oh and if you are using STCS, I don't think the below is an issue at all
since that can run in parallel if needed already.


On 7/1/13 10:24 AM, "Hiller, Dean" <> wrote:

>We use playorm to do 80,000 virtual column families(a playorm feature
>though the pattern could be copied).  We did find out later and we are
>working on this now that we wanted to map 80,000 virtual CF's into 10
>real CF's so leveled compaction can run more in parallel though or else
>we get stuck with single threaded LCS at the last tier which can take a
>while.  We are about to map/reduce our dataset into our newest format.
>From: Kirk True <<>>
>Reply-To: "<>"
>Date: Monday, July 1, 2013 10:19 AM
>To: "<>"
>Subject: 10,000s of column families/keyspaces
>Hi all,
>I know it's an old topic, but I want to see if anything's changed on the
>number of column families that C* supports, either in 1.2.x or 2.x.
>For a number of reasons [1], we'd like to support multi-tenancy via
>separate column families. The "problem" is that there are around 5,000
>tenants to support and each one needs a small handful of column families
>The last I heard C* supports 'a couple of hundred' column families before
>things start to bog down.
>What will it take for C* to support 50,000 column families?
>I'm about to dive into the code and run some tests, but I was curious
>about how to quantify the overhead of a column family. Is the reason
>performance? Memory? Does the off-heap work help here?
>[1] The main three reasons:
> 1.  ability to wholesale drop data for a given tenant via drop
>keyspace/drop CFs
> 2.  ability to have divergent schema for each tenant (partially effected
>by DSE Solr integration)
> 3.  secondary indexes per tenant (given requirement #2)

View raw message