incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: 10,000s of column families/keyspaces
Date Mon, 01 Jul 2013 16:48:25 GMT
Oh and if you are using STCS, I don't think the below is an issue at all
since that can run in parallel if needed already.

Dean

On 7/1/13 10:24 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>We use playorm to do 80,000 virtual column families(a playorm feature
>though the pattern could be copied).  We did find out later and we are
>working on this now that we wanted to map 80,000 virtual CF's into 10
>real CF's so leveled compaction can run more in parallel though or else
>we get stuck with single threaded LCS at the last tier which can take a
>while.  We are about to map/reduce our dataset into our newest format.
>
>Dean
>
>From: Kirk True <kirktrue.im@gmail.com<mailto:kirktrue.im@gmail.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Monday, July 1, 2013 10:19 AM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: 10,000s of column families/keyspaces
>
>Hi all,
>
>I know it's an old topic, but I want to see if anything's changed on the
>number of column families that C* supports, either in 1.2.x or 2.x.
>
>For a number of reasons [1], we'd like to support multi-tenancy via
>separate column families. The "problem" is that there are around 5,000
>tenants to support and each one needs a small handful of column families
>each.
>
>The last I heard C* supports 'a couple of hundred' column families before
>things start to bog down.
>
>What will it take for C* to support 50,000 column families?
>
>I'm about to dive into the code and run some tests, but I was curious
>about how to quantify the overhead of a column family. Is the reason
>performance? Memory? Does the off-heap work help here?
>
>Thanks,
>Kirk
>
>[1] The main three reasons:
>
>
> 1.  ability to wholesale drop data for a given tenant via drop
>keyspace/drop CFs
> 2.  ability to have divergent schema for each tenant (partially effected
>by DSE Solr integration)
> 3.  secondary indexes per tenant (given requirement #2)


Mime
View raw message