Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAKtx059SmSramFw+jog5PXKCuD-3b4OXujwK83bXER4o37acRw@mail.gmail.com>
References: 
 <CAKtx059SmSramFw+jog5PXKCuD-3b4OXujwK83bXER4o37acRw@mail.gmail.com>
Date: Thu, 23 Jan 2014 09:34:23 +0100
Message-ID: 
 <CAKtx05_FVugvegoPX3=O3AhYd5AARe5MoxBjyWWe8OMyKNT_Cw@mail.gmail.com>
Subject: Re: Datamodel for a highscore list
From: Kasper Middelboe Petersen <kasper@sybogames.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a113aaa3ecdfc4504f09f1bee

--001a113aaa3ecdfc4504f09f1bee
Content-Type: text/plain; charset=ISO-8859-1

What would the consequence be of having this updated highscore table (using
friendId as part of the clustering index to avoid name collisions):

CREATE TABLE highscore (
  userId uuid,
  score int,
  friendId uuid,
  name varchar,
  PRIMARY KEY(userId, score, friendId)
) WITH CLUSTERING ORDER BY (score DESC);

And then create an index:

CREATE INDEX friendId_idx ON highscore ( friendId );

The table will have many million (I should expect 100+ million) entries.
Each friendId would appear as many times as the user has friends. It sounds
like a scenario where I should take care of using a custom index.

I haven't worked with custom indexes in Cassandra before, but I assume this
would allow me to query the table based on (userId, friendId) for updating
highscores.

But what would happen in this case? What queries would be affected and
roughly to what degree?

Would this be a viable option?


On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen <
kasper@sybogames.com> wrote:

> Hi!
>
> I'm a little worried about the data model I have come up with for handling
> highscores.
>
> I have a lot of users. Each user has a number of friends. I need a
> highscore list pr friend list.
>
> I would like to have it optimized for reading the highscores as opposed to
> setting a new highscore as the use case would suggest I would need to read
> the list a lot more than I would need write new highscores.
>
> Currently I have the following tables:
> CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
> int, PRIMARY KEY(userId))
> CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
> KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
> ... and a tables for friends - for the purpose of this mail assume
> everyone is friends with everyone else
>
> Reading the highscore list for a given user is easy. SELECT * FROM
> highscores WHERE userId = <id>.
>
> Problem is setting a new highscore.
> 1. I need to read-before-write to get the old score
> 2. I'm screwed if something goes wrong and the old score gets overwritten
> before all the friends highscore lists gets updated - and it is an highly
> visible error due to the same user is on the highscore multiple times.
>
> I would very much appreciate some feedback and/or alternatives to how to
> solve this with Cassandra.
>
>
> Thanks,
> Kasper
>

--001a113aaa3ecdfc4504f09f1bee
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">What would the consequence be of having this updated highs=
core table (using friendId as part of the clustering index to avoid name co=
llisions):<div><br></div><div><span style=3D"font-family:arial,sans-serif;f=
ont-size:13px">CREATE TABLE highscore (</span></div>
<div><span style=3D"font-family:arial,sans-serif;font-size:13px">=A0 userId=
 uuid,</span></div><div><span style=3D"font-family:arial,sans-serif;font-si=
ze:13px">=A0 score int,</span></div><div><span style=3D"font-family:arial,s=
ans-serif;font-size:13px">=A0 friendId uuid,</span></div>
<div><span style=3D"font-family:arial,sans-serif;font-size:13px">=A0 name v=
archar,</span></div><div><span style=3D"font-family:arial,sans-serif;font-s=
ize:13px">=A0 PRIMARY KEY(userId, score, friendId)</span></div><div><span s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">) WITH CLUSTERING ORDE=
R BY (score DESC);</span><br>
</div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"><br>=
</span></div><div>And then create an index:</div><div><br></div><div><span =
style=3D"color:rgb(75,75,75);font-size:13px;line-height:16px;white-space:pr=
e-wrap">CREATE INDEX </span><span style=3D"color:rgb(75,75,75);font-size:13=
px;line-height:16px;white-space:pre-wrap">friendId_idx ON highscore ( frien=
dId ); </span></div>
<div><span style=3D"color:rgb(75,75,75);font-size:13px;line-height:16px;whi=
te-space:pre-wrap"><br></span></div><div><font color=3D"#4b4b4b"><span styl=
e=3D"line-height:16px;white-space:pre-wrap">The table will have many millio=
n (I should expect 100+ million) entries. Each friendId would appear as man=
y times as the user has friends. It sounds like a scenario where I should t=
ake care of using a custom index.</span></font></div>
<div><font color=3D"#4b4b4b"><span style=3D"line-height:16px;white-space:pr=
e-wrap"><br></span></font></div><div><font color=3D"#4b4b4b"><span style=3D=
"line-height:16px;white-space:pre-wrap">I haven&#39;t worked with custom in=
dexes in Cassandra before, but I assume this would allow me to query the ta=
ble based on (userId, friendId) for updating highscores.</span></font></div=
>
<div><font color=3D"#4b4b4b"><span style=3D"line-height:16px;white-space:pr=
e-wrap"><br></span></font></div><div>But what would happen in this case? Wh=
at queries would be affected and roughly to what degree?</div><div><br></di=
v>
<div>Would this be a viable option?</div><div><span style=3D"font-family:ar=
ial,sans-serif;font-size:13px"><br></span></div></div><div class=3D"gmail_e=
xtra"><br><br><div class=3D"gmail_quote">On Wed, Jan 22, 2014 at 6:44 PM, K=
asper Middelboe Petersen <span dir=3D"ltr">&lt;<a href=3D"mailto:kasper@syb=
ogames.com" target=3D"_blank">kasper@sybogames.com</a>&gt;</span> wrote:<br=
>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi!<div><br></div><div>I=
9;m a little worried about the data model I have come up with for handling =
highscores.</div>
<div><br></div><div>I have a lot of users. Each user has a number of friend=
s. I need a highscore list pr friend list.</div>
<div><br></div><div>I would like to have it optimized for reading the highs=
cores as opposed to setting a new highscore as the use case would suggest I=
 would need to read the list a lot more than I would need write new highsco=
res.</div>

<div><br></div><div>Currently I have the following tables:</div><div>CREATE=
 TABLE user (userId uuid, name varchar, highscore int, bestcombo int, PRIMA=
RY KEY(userId))</div><div>CREATE TABLE highscore (userId uuid, score int, n=
ame varchar, PRIMARY KEY(userId, score, name)) WITH CLUSTERING ORDER BY (sc=
ore DESC);<br>

</div><div>... and a tables for friends - for the purpose of this mail assu=
me everyone is friends with everyone else</div><div><br></div><div>Reading =
the highscore list for a given user is easy. SELECT * FROM highscores WHERE=
 userId =3D &lt;id&gt;.</div>

<div><br></div><div>Problem is setting a new highscore.</div><div>1. I need=
 to read-before-write to get the old score</div><div>2. I&#39;m screwed if =
something goes wrong and the old score gets overwritten before all the frie=
nds highscore lists gets updated - and it is an highly visible error due to=
 the same user is on the highscore multiple times.</div>

<div><br></div><div>I would very much appreciate some feedback and/or alter=
natives to how to solve this with Cassandra.<br></div><div><br></div><div><=
br></div><div>Thanks,</div><div>Kasper</div></div>
</blockquote></div><br></div>

--001a113aaa3ecdfc4504f09f1bee--