incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kasper Middelboe Petersen <>
Subject Re: Handling data consistency
Date Tue, 25 Feb 2014 09:11:48 GMT
Hi Steve,

I've considered this approach before and I'm partial to going this way

The reason I haven't yet is the fact that I'm fairly confident that the
user patterns would have a lot more highscore list reads than setting of
highscores or adding friends. And most of the reads from highscores is
either the 5 closest scores to a given score or the top-5. If a user has
100 friends it seems very excessive to have to read all these 100 entries
to get the top 5. With the score as part of the clustering key this is
possible to do efficiently.

Do you have any thoughts on this?


On Tue, Feb 25, 2014 at 1:30 AM, Steven A Robenalt <>wrote:

> Hi Kasper,
> I am assuming that your friend list is symmetric (i.e. If I am your friend
> then you are also my friend), which your comments seem to indicate.
> First, I would suggest that you drop the friends score as a part of the
> clustering key, which eliminates the need to read-before-write.
> With that in mind, here's what I'd recommend:
> 1) Each user has their own high score and timestamp as part of their own
> attributes (thus keyed to their own user id).
> 2) Each user also has a list of their own friends, keyed to their own user
> id and the friend's user_id, and including the friend's high score and
> updated time as attributes.
> 3) When I add a friend, I copy their high score and timestamp to my own
> friends list, and I copy my current high score and timestamp to their
> friends list.
> 4) When I update my own high score, I grab the ids of all of my friends
> from my own friends list and push my new high score and timestamp to each
> of them.
> 5) When I remove a friend, I delete myself from their friends list and
> them from mine.
> 6) When I view any friends list, the result must be sorted at that time,
> presumably by descending high score (since the score is no longer part of
> the clustering key).
> There's a couple of potential conflicts in this strategy, which may occur
> (for example) if my friend updates a high score at the same time I drop
> them as a friend (they would end up re-inserting themselves to my friend
> list after I dropped them). In this case, I'd need to simply drop them
> again. This may or may not be acceptable in your application. Most conflict
> situations could be resolved with use of lightweight transactions if you
> want to sacrifice performance (and assuming that the condition happens
> often enough to warrant such treatment).
> For performance reasons, it may also be desirable to lump all of the high
> score updates for a user in a batch statement if you have the option to do
> so.
> Anyway, that eliminates the read-before-write problem, and also insures
> that if a high score is missed at the time a new user is added as a friend,
> it can at least be updated later with the correct value. Not sure how much
> help that is, but maybe it'll give you some ideas you can experiment with.
> Steve
> On Mon, Feb 24, 2014 at 9:47 AM, Kasper Middelboe Petersen <
>> wrote:
>> Hi,
>> My requirements include a system that can handle friend based highscore
>> lists (as a user I have a bunch of friends from various social sites like
>> Facebook). The user must have a highscore list that consist of his friends
>> only.
>> I have implemented this using the users ID as partition key and the
>> friends score and id as clustering keys. This keeps reads of the highscore
>> list fast and straight forward.
>> Updating a highscore is a bit cumbersome and suffers from
>> read-before-write, but can largely be done without too much worry. The big
>> issue here is I need to know the exact old highscore to be able to set a
>> new one.
>> The big problem arise when a new user connects and have many friends that
>> needs to be added. This can end up being quite an extensive amount of
>> queries that has to happen and could take some time to do:
>>  - Lookup the friends user id based on the social credentials
>>  - Lookup the friends highscore
>>  - Lookup the users own highscore
>>  - Add friend with his highscore to self
>>  - Add self to with own highscore to friend
>>  - Do update to self with lastUpdatedFriends timestamp
>>  - Do update to friend with lastUpdatedFriends timestamp
>> Now if a new user has a bunch of friends this could end up being quite a
>> lot of queries - all suffering from the read-before-write problem. Should
>> any of the friends set a new highscore between the lookup and the writes
>> the highscore would never be set correctly and duplicates would happen.
>> I'm open to any suggestions ranging from how to model this differently to
>> avoid the read-before-write to how to do this without risking having
>> duplicate data that would be extremely painful to try and find again in
>> highscore lists?
>> Thanks,
>> Kasper
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063

View raw message