cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Clark <co...@clark.ws>
Subject Re: Mutable primary key in a table
Date Sun, 08 Feb 2015 15:21:11 GMT
No need for CAS in my suggestion - I would try to avoid the use of CAS if at all possible.
 

It’s better in a distributed environment to reduce dimensionality and isolate write/read
paths (event sourcing and CQRS patterns).

Also, just in general, changing the primary key on an update is usually considered a bad idea
and is simply not even permitted by most RDBMS.
—
Colin Clark
colin@clark.ws
+1 320-221-9531
skype colin.p.clark

> On Feb 8, 2015, at 4:16 PM, Eric Stevens <mightye@gmail.com> wrote:
> 
> It sounds like changing user names is the kind of thing which doesn't happen often, in
which case you probably don't have to worry too much about the additional overhead of using
logged batches (not like you're going to be doing hundreds to thousands of these per second).
 You probably also want to look into conditional updates (search for Compare And Set - CAS)
to help avoid collisions when creating or renaming users.
> 
> Colin's suggestion of using a surrogate key for the primary key on the user table is
also a good idea, but you'll still want to use CAS to help maintain the integrity of your
data.  Note that CAS has a similar overhead to logged batches in that it also involves a Paxos
round.  So keep the number of statements in either CAS or logged batches as minimal as possible.
> 
> On Sun, Feb 8, 2015 at 7:17 AM, Colin <colin@clark.ws <mailto:colin@clark.ws>>
wrote:
> Another way to do this is to use a time based uuid for the primary key (partition key)
and to store the user name with that uuid.
> 
> In addition, you'll need 2 additonal tables, one that is used to get the uuid by user
name and another to track user name changes over time which would be organized by uuid, and
user name (cluster on the name).
> 
> This pattern is referred to as an inverted index and provides a lot of power and flexibility
once mastered.  I use it all the time with cassandra - in fact, to be successful with cassandra,
it might actually be a requirement!
> 
> --
> Colin Clark 
> +1 612 859 6129 <tel:%2B1%20612%20859%206129>
> Skype colin.p.clark
> 
> On Feb 8, 2015, at 8:08 AM, Jack Krupansky <jack.krupansky@gmail.com <mailto:jack.krupansky@gmail.com>>
wrote:
> 
>> What is your full primary key? Specifically, what is the partition key, as opposed
to clustering columns?
>> 
>> The point is that the partition key for a row is hashed to determine the token for
the partition, which in turn determines which node of the cluster owns that partition. Changing
the partition key means that potentially the partition would need to be "moved" to another
node, which is clearly not something that Cassandra would do since the core design of Cassandra
is that all operations should be blazingly fast and to refrain from offering slow features.
>> 
>> I would recommend that your application:
>> 
>> 1. Read the existing user data
>> 2. Create a new user, using the existing user data.
>> 3. Update the old user row to indicate that it is no longer a valid user. Actually,
you will have to decide on an application policy for old user names. For example, can they
be reused, or are they locked, or... whatever.
>> 
>> 
>> -- Jack Krupansky
>> 
>> On Sun, Feb 8, 2015 at 1:48 AM, Ajaya Agrawal <ajku.agr@gmail.com <mailto:ajku.agr@gmail.com>>
wrote:
>> 
>> On Sun, Feb 8, 2015 at 5:03 AM, Eric Stevens <mightye@gmail.com <mailto:mightye@gmail.com>>
wrote:
>> I'm struggling to think of a model where it makes sense to update a primary key as
a typical operation.  It suggests, as Adil said, that you may be reasoning wrong about your
data model.  Maybe you can explain your problem in more detail - what kind of thing has you
updating your PK on a regular basis?
>> 
>> I have a 'user' table which has a column called 'user_name' and other columns like
name, city etc. The application requires that user_name be unique and user should be searchable
by 'user_name'. The only way to do this in C* would be to make user_name column primary key.
Things get trickier when there is a requirement which says that user_name can be changed by
the users of the application. This a distributed application which mean that it runs on multiple
nodes. If I have to change user_name atomically then either I need to implement distributed
locking or use something C* provides.   
>> 
>> 
> 


Mime
View raw message