incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hood <0x6e6...@gmail.com>
Subject Re: Modeling multi-tenanted Cassandra schema
Date Thu, 14 Nov 2013 13:28:03 GMT
OK, so in the end I elected to go for option (c), which makes my table
definition look like this:

create table tenanted_foo_table (
    tenant ascii,
    application_key bigint,
    timestamp timestamp,
    .... other non-key columns
    PRIMARY KEY ((tenant, application_key), timestamp)
)

such that on disk the row keys are effectively tenant:application_key
concatenations.

Thanks for your input,

Ben

On Wed, Nov 13, 2013 at 2:43 PM, Nate McCall <nate@thelastpickle.com> wrote:
> Astyanax and/or the DS Java client depending on your use case. (Emphasis on
> the "and" - really no reason you can't use both - even on the same schema -
> depending on what you are doing as they both have their strengths and
> weaknesses).
>
> To be clear, Hector is not going away. We are still accepting patches and
> updates, but there is no active feature development.
>
> Any other hector specific questions, please start a thread over on
> hector-users@googlegroups.com
>
>
> On Wed, Nov 13, 2013 at 8:35 AM, Shahab Yunus <shahab.yunus@gmail.com>
> wrote:
>>
>> Nate,
>>
>> (slightly OT), what client API/library is recommended now that Hector is
>> sunsetting? Thanks.
>>
>> Regards,
>> Shahab
>>
>>
>> On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall <nate@thelastpickle.com>
>> wrote:
>>>
>>> You basically want option (c). Option (d) might work, but you would be
>>> bending the paradigm a bit, IMO. Certainly do not use dedicated column
>>> families or keyspaces per tennant. That never works. The list history will
>>> show that with a few google searches and we've seen it fail badly with
>>> several clients.
>>>
>>> Overall, option (c) would be difficult to do in CQL without some very
>>> well thought out abstractions and/or a deep hack on the Java driver (not
>>> in-ellegant or impossible, just lots of moving parts to get your head around
>>> if you are new to such). That said, depending on the size of your project
>>> and skill of your team, this direction might be worth considering.
>>>
>>> Usergrid (just accepted for incubation at Apache) functions this way via
>>> the Thrift API: https://github.com/apigee/usergrid-stack
>>>
>>> The commercial version of Usergrid has "tens of thousands" of active
>>> tennants on a single cluster (same code base at the service layer as the
>>> open source version). It uses Hector's built in virtual keyspaces:
>>> https://github.com/hector-client/hector/wiki/Virtual-Keyspaces (NOTE: though
>>> Hector is sunsetting/in patch maintenance, the approach is certainly
>>> legitimate - but I'd recommend you *not* start a new project on Hector).
>>>
>>> In short, Usergrid is the only project I know of that has a well-proven
>>> tenant model that functions at scale, though I'm sure there are others
>>> around, just not open sourced or actually running large deployments.
>>>
>>> Astyanax can do this as well albeit with a little more work required:
>>>
>>> https://github.com/Netflix/astyanax/wiki/Composite-columns#how-to-use-the-prefixedserializer-but-you-really-should-use-composite-columns
>>>
>>> Happy to clarify any of the above.
>>>
>>>
>>> On Tue, Nov 12, 2013 at 3:19 AM, Ben Hood <0x6e6562@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've just received a requirement to make a Cassandra app
>>>> multi-tenanted, where we'll have up to 100 tenants.
>>>>
>>>> Most of the tables are timestamped wide row tables with a natural
>>>> application key for the partitioning key and a timestamp key as a
>>>> cluster key.
>>>>
>>>> So I was considering the options:
>>>>
>>>> (a) Add a tenant column to each table and stick a secondary index on
>>>> that column;
>>>> (b) Add a tenant column to each table and maintain index tables that
>>>> use the tenant id as a partitioning key;
>>>> (c) Decompose the partitioning key of each table and add the tenant
>>>> and the leading component of the key;
>>>> (d) Add the tenant as a separate clustering key;
>>>> (e) Replicate the schema in separate tenant specific key spaces;
>>>> (f) Something I may have missed;
>>>>
>>>> Option (a) seems the easiest, but I'm wary of just adding secondary
>>>> indexes without thinking about it.
>>>>
>>>> Option (b) seems to have the least impact of the layout of the
>>>> storage, but a cost of maintaining each index table, both code wise
>>>> and in terms of performance.
>>>>
>>>> Option (c) seems quite straight forward, but I feel it might have a
>>>> significant effect on the distribution of the rows, if the cardinality
>>>> of the tenants is low.
>>>>
>>>> Option (d) seems simple enough, but it would mean that you couldn't
>>>> query for a range of tenants without supplying a range of natural
>>>> application keys, through which you would need to iterate (under the
>>>> assumption that you don't use an ordered partitioner).
>>>>
>>>> Option (e) appears relatively straight forward, but it does mean that
>>>> the application CQL client needs to maintain separate cluster
>>>> connections for each tenant. Also I'm not sure to what extent key
>>>> spaces were designed to partition identically structured data.
>>>>
>>>> Does anybody have any experience with running a multi-tenanted
>>>> Cassandra app, or does this just depend too much on the specifics of
>>>> the application?
>>>>
>>>> Cheers,
>>>>
>>>> Ben
>>>
>>>
>>>
>>>
>>> --
>>> -----------------
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>
>>
>
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com

Mime
View raw message