cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <>
Subject Re: Modeling multi-tenanted Cassandra schema
Date Wed, 13 Nov 2013 14:35:39 GMT

(slightly OT), what client API/library is recommended now that Hector is
sunsetting? Thanks.


On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall <> wrote:

> You basically want option (c). Option (d) might work, but you would be
> bending the paradigm a bit, IMO. Certainly do not use dedicated column
> families or keyspaces per tennant. That never works. The list history will
> show that with a few google searches and we've seen it fail badly with
> several clients.
> Overall, option (c) would be difficult to do in CQL without some very well
> thought out abstractions and/or a deep hack on the Java driver (not
> in-ellegant or impossible, just lots of moving parts to get your head
> around if you are new to such). That said, depending on the size of your
> project and skill of your team, this direction might be worth considering.
> Usergrid (just accepted for incubation at Apache) functions this way via
> the Thrift API:
> The commercial version of Usergrid has "tens of thousands" of active
> tennants on a single cluster (same code base at the service layer as the
> open source version). It uses Hector's built in virtual keyspaces:
> (NOTE:
> though Hector is sunsetting/in patch maintenance, the approach is certainly
> legitimate - but I'd recommend you *not* start a new project on Hector).
> In short, Usergrid is the only project I know of that has a well-proven
> tenant model that functions at scale, though I'm sure there are others
> around, just not open sourced or actually running large deployments.
> Astyanax can do this as well albeit with a little more work required:
> Happy to clarify any of the above.
> On Tue, Nov 12, 2013 at 3:19 AM, Ben Hood <> wrote:
>> Hi,
>> I've just received a requirement to make a Cassandra app
>> multi-tenanted, where we'll have up to 100 tenants.
>> Most of the tables are timestamped wide row tables with a natural
>> application key for the partitioning key and a timestamp key as a
>> cluster key.
>> So I was considering the options:
>> (a) Add a tenant column to each table and stick a secondary index on
>> that column;
>> (b) Add a tenant column to each table and maintain index tables that
>> use the tenant id as a partitioning key;
>> (c) Decompose the partitioning key of each table and add the tenant
>> and the leading component of the key;
>> (d) Add the tenant as a separate clustering key;
>> (e) Replicate the schema in separate tenant specific key spaces;
>> (f) Something I may have missed;
>> Option (a) seems the easiest, but I'm wary of just adding secondary
>> indexes without thinking about it.
>> Option (b) seems to have the least impact of the layout of the
>> storage, but a cost of maintaining each index table, both code wise
>> and in terms of performance.
>> Option (c) seems quite straight forward, but I feel it might have a
>> significant effect on the distribution of the rows, if the cardinality
>> of the tenants is low.
>> Option (d) seems simple enough, but it would mean that you couldn't
>> query for a range of tenants without supplying a range of natural
>> application keys, through which you would need to iterate (under the
>> assumption that you don't use an ordered partitioner).
>> Option (e) appears relatively straight forward, but it does mean that
>> the application CQL client needs to maintain separate cluster
>> connections for each tenant. Also I'm not sure to what extent key
>> spaces were designed to partition identically structured data.
>> Does anybody have any experience with running a multi-tenanted
>> Cassandra app, or does this just depend too much on the specifics of
>> the application?
>> Cheers,
>> Ben
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting

View raw message