incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hood <>
Subject Re: Modeling multi-tenanted Cassandra schema
Date Thu, 14 Nov 2013 13:28:03 GMT
OK, so in the end I elected to go for option (c), which makes my table
definition look like this:

create table tenanted_foo_table (
    tenant ascii,
    application_key bigint,
    timestamp timestamp,
    .... other non-key columns
    PRIMARY KEY ((tenant, application_key), timestamp)

such that on disk the row keys are effectively tenant:application_key

Thanks for your input,


On Wed, Nov 13, 2013 at 2:43 PM, Nate McCall <> wrote:
> Astyanax and/or the DS Java client depending on your use case. (Emphasis on
> the "and" - really no reason you can't use both - even on the same schema -
> depending on what you are doing as they both have their strengths and
> weaknesses).
> To be clear, Hector is not going away. We are still accepting patches and
> updates, but there is no active feature development.
> Any other hector specific questions, please start a thread over on
> On Wed, Nov 13, 2013 at 8:35 AM, Shahab Yunus <>
> wrote:
>> Nate,
>> (slightly OT), what client API/library is recommended now that Hector is
>> sunsetting? Thanks.
>> Regards,
>> Shahab
>> On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall <>
>> wrote:
>>> You basically want option (c). Option (d) might work, but you would be
>>> bending the paradigm a bit, IMO. Certainly do not use dedicated column
>>> families or keyspaces per tennant. That never works. The list history will
>>> show that with a few google searches and we've seen it fail badly with
>>> several clients.
>>> Overall, option (c) would be difficult to do in CQL without some very
>>> well thought out abstractions and/or a deep hack on the Java driver (not
>>> in-ellegant or impossible, just lots of moving parts to get your head around
>>> if you are new to such). That said, depending on the size of your project
>>> and skill of your team, this direction might be worth considering.
>>> Usergrid (just accepted for incubation at Apache) functions this way via
>>> the Thrift API:
>>> The commercial version of Usergrid has "tens of thousands" of active
>>> tennants on a single cluster (same code base at the service layer as the
>>> open source version). It uses Hector's built in virtual keyspaces:
>>> (NOTE: though
>>> Hector is sunsetting/in patch maintenance, the approach is certainly
>>> legitimate - but I'd recommend you *not* start a new project on Hector).
>>> In short, Usergrid is the only project I know of that has a well-proven
>>> tenant model that functions at scale, though I'm sure there are others
>>> around, just not open sourced or actually running large deployments.
>>> Astyanax can do this as well albeit with a little more work required:
>>> Happy to clarify any of the above.
>>> On Tue, Nov 12, 2013 at 3:19 AM, Ben Hood <> wrote:
>>>> Hi,
>>>> I've just received a requirement to make a Cassandra app
>>>> multi-tenanted, where we'll have up to 100 tenants.
>>>> Most of the tables are timestamped wide row tables with a natural
>>>> application key for the partitioning key and a timestamp key as a
>>>> cluster key.
>>>> So I was considering the options:
>>>> (a) Add a tenant column to each table and stick a secondary index on
>>>> that column;
>>>> (b) Add a tenant column to each table and maintain index tables that
>>>> use the tenant id as a partitioning key;
>>>> (c) Decompose the partitioning key of each table and add the tenant
>>>> and the leading component of the key;
>>>> (d) Add the tenant as a separate clustering key;
>>>> (e) Replicate the schema in separate tenant specific key spaces;
>>>> (f) Something I may have missed;
>>>> Option (a) seems the easiest, but I'm wary of just adding secondary
>>>> indexes without thinking about it.
>>>> Option (b) seems to have the least impact of the layout of the
>>>> storage, but a cost of maintaining each index table, both code wise
>>>> and in terms of performance.
>>>> Option (c) seems quite straight forward, but I feel it might have a
>>>> significant effect on the distribution of the rows, if the cardinality
>>>> of the tenants is low.
>>>> Option (d) seems simple enough, but it would mean that you couldn't
>>>> query for a range of tenants without supplying a range of natural
>>>> application keys, through which you would need to iterate (under the
>>>> assumption that you don't use an ordered partitioner).
>>>> Option (e) appears relatively straight forward, but it does mean that
>>>> the application CQL client needs to maintain separate cluster
>>>> connections for each tenant. Also I'm not sure to what extent key
>>>> spaces were designed to partition identically structured data.
>>>> Does anybody have any experience with running a multi-tenanted
>>>> Cassandra app, or does this just depend too much on the specifics of
>>>> the application?
>>>> Cheers,
>>>> Ben
>>> --
>>> -----------------
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting

View raw message