incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Robenalt <srobe...@stanford.edu>
Subject Re: Handling data consistency
Date Tue, 25 Feb 2014 00:30:20 GMT
Hi Kasper,

I am assuming that your friend list is symmetric (i.e. If I am your friend
then you are also my friend), which your comments seem to indicate.

First, I would suggest that you drop the friends score as a part of the
clustering key, which eliminates the need to read-before-write.

With that in mind, here's what I'd recommend:

1) Each user has their own high score and timestamp as part of their own
attributes (thus keyed to their own user id).

2) Each user also has a list of their own friends, keyed to their own user
id and the friend's user_id, and including the friend's high score and
updated time as attributes.

3) When I add a friend, I copy their high score and timestamp to my own
friends list, and I copy my current high score and timestamp to their
friends list.

4) When I update my own high score, I grab the ids of all of my friends
from my own friends list and push my new high score and timestamp to each
of them.

5) When I remove a friend, I delete myself from their friends list and them
from mine.

6) When I view any friends list, the result must be sorted at that time,
presumably by descending high score (since the score is no longer part of
the clustering key).

There's a couple of potential conflicts in this strategy, which may occur
(for example) if my friend updates a high score at the same time I drop
them as a friend (they would end up re-inserting themselves to my friend
list after I dropped them). In this case, I'd need to simply drop them
again. This may or may not be acceptable in your application. Most conflict
situations could be resolved with use of lightweight transactions if you
want to sacrifice performance (and assuming that the condition happens
often enough to warrant such treatment).

For performance reasons, it may also be desirable to lump all of the high
score updates for a user in a batch statement if you have the option to do
so.

Anyway, that eliminates the read-before-write problem, and also insures
that if a high score is missed at the time a new user is added as a friend,
it can at least be updated later with the correct value. Not sure how much
help that is, but maybe it'll give you some ideas you can experiment with.

Steve


On Mon, Feb 24, 2014 at 9:47 AM, Kasper Middelboe Petersen <
kasper@sybogames.com> wrote:

> Hi,
>
> My requirements include a system that can handle friend based highscore
> lists (as a user I have a bunch of friends from various social sites like
> Facebook). The user must have a highscore list that consist of his friends
> only.
>
> I have implemented this using the users ID as partition key and the
> friends score and id as clustering keys. This keeps reads of the highscore
> list fast and straight forward.
>
> Updating a highscore is a bit cumbersome and suffers from
> read-before-write, but can largely be done without too much worry. The big
> issue here is I need to know the exact old highscore to be able to set a
> new one.
>
> The big problem arise when a new user connects and have many friends that
> needs to be added. This can end up being quite an extensive amount of
> queries that has to happen and could take some time to do:
>  - Lookup the friends user id based on the social credentials
>  - Lookup the friends highscore
>  - Lookup the users own highscore
>  - Add friend with his highscore to self
>  - Add self to with own highscore to friend
>  - Do update to self with lastUpdatedFriends timestamp
>  - Do update to friend with lastUpdatedFriends timestamp
>
> Now if a new user has a bunch of friends this could end up being quite a
> lot of queries - all suffering from the read-before-write problem. Should
> any of the friends set a new highscore between the lookup and the writes
> the highscore would never be set correctly and duplicates would happen.
>
> I'm open to any suggestions ranging from how to model this differently to
> avoid the read-before-write to how to do this without risking having
> duplicate data that would be extremely painful to try and find again in
> highscore lists?
>
>
> Thanks,
> Kasper
>



-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

Mime
View raw message