Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of pkdogcom@gmail.com designates
 209.85.215.52 as permitted sender)
MIME-Version: 1.0
Date: Thu, 15 May 2014 20:14:11 -0700
Message-ID: 
 <CAL-nyFsH5srmBpA-YD-LE=NubXexfkUPatcwTaFH71DjRKTU5A@mail.gmail.com>
Subject: Data modeling for Pinterest-like application
From: ziju feng <pkdogcom@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11349908b56d4804f97bce8f

--001a11349908b56d4804f97bce8f
Content-Type: text/plain; charset=UTF-8

Hello,

I'm working on data modeling for a Pinterest-like project. There are
basically two main concepts: Pin and Board, just like Pinterest, where pin
is an item containing an image, description and some other information such
as a like count, and each board should contain a sorted list of Pins.

The board can be modeled with primary key (board_id, created_at, pin_id)
where created_at is used to sort the pins of the board by date. The problem
is whether I should denormalize details of pins into the board table or
just retrieve pins by page (page size can be 10~20) and then multi-get by
pin_ids to obtain details.

Since there are some boards that are accessed very often (like the home
board), denormalization seems to be a reasonable choice to enhance read
performance. However, we then have to update not only the pin table be also
each row in the board table that contains the pin whenever a pin is
updated, which sometimes could be quite frequent (such as updating the like
count). Since a pin may be contained by many boards (could be thousands),
denormalization seems to bring a lot of load on the write side as well as
application code complexity.

Any suggestion to whether our data model should go denormalized or the
normalized/multi-get way which then perhaps need a separate cached layer
for read?

Thanks,

Ziju

--001a11349908b56d4804f97bce8f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-size:small">Hel=
lo,</div><div class=3D"gmail_default" style=3D"font-size:small"><br></div><=
div class=3D"gmail_default" style=3D"font-size:small">I&#39;m working on da=
ta modeling for a Pinterest-like project. There are basically two main conc=
epts: Pin and Board, just like Pinterest, where pin is an item containing a=
n image, description and some other information such as a like count, and e=
ach board should contain a sorted list of Pins.=C2=A0</div>
<div class=3D"gmail_default" style=3D"font-size:small"><br></div><div class=
=3D"gmail_default" style=3D"font-size:small">The board can be modeled with =
primary key (board_id, created_at, pin_id) where created_at is used to sort=
 the pins of the board by date. The problem is whether I should denormalize=
 details of pins into the board table or just retrieve pins by page (page s=
ize can be 10~20) and then multi-get by pin_ids to obtain details.</div>
<div class=3D"gmail_default" style=3D"font-size:small"><br></div><div class=
=3D"gmail_default" style=3D"font-size:small">Since there are some boards th=
at are accessed very often (like the home board), denormalization seems to =
be a reasonable choice to enhance read performance. However, we then have t=
o update not only the pin table be also each row in the board table that co=
ntains the pin whenever a pin is updated, which sometimes could be quite fr=
equent (such as updating the like count). Since a pin may be contained by m=
any boards (could be thousands), denormalization seems to bring a lot of lo=
ad on the write side as well as application code complexity.=C2=A0</div>

<div class=3D"gmail_default" style=3D"font-size:small"><br></div>
<div class=3D"gmail_default" style=3D"font-size:small">Any suggestion to wh=
ether our data model should go denormalized or the normalized/multi-get way=
 which then perhaps need a separate cached layer for read?</div><div class=
=3D"gmail_default" style=3D"font-size:small">
<br></div><div class=3D"gmail_default" style=3D"font-size:small">Thanks,</d=
iv><div class=3D"gmail_default" style=3D"font-size:small"><br></div><div cl=
ass=3D"gmail_default" style=3D"font-size:small">Ziju</div></div>

--001a11349908b56d4804f97bce8f--