Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7699811FAC for ; Fri, 16 May 2014 17:32:20 +0000 (UTC) Received: (qmail 86754 invoked by uid 500); 16 May 2014 10:53:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57770 invoked by uid 500); 16 May 2014 10:33:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51442 invoked by uid 99); 16 May 2014 10:24:38 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 10:24:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pkdogcom@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 03:14:35 +0000 Received: by mail-la0-f52.google.com with SMTP id gl10so1479803lab.39 for ; Thu, 15 May 2014 20:14:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=YDwVXbhfaE9wjZCO0s9Sy3U46SCGtK2nVuiqWy4oLk8=; b=RLDB+LffLD08rs7HJmwBdihJmCqxFeobDlAFfsXvuaGH98DHOnuzasl53O5Bubwa8E jwQeuk8fji8Cb44KATwEexPZXLumq6CCIYFr/8CqHH+PlkOUbOrjmtWvICw+ilMp7T3g r317c7Yp+YWlNtzXYulspVD527/hzL46wfOr2RvMDNDKDCy/OpfQDa+6a9v9DfuZGEyI fTReBLh26pX2r1cVjs/NicDqVCRtoTpOXz+7cQUbiPY/7poSc0/v7S/tqY9TLeFX/r+H ZZB63tgooVZb83Xp8NSyxlVVI7YQmnIc5xEPln+/gvmqT0/1q1SkhjiDqabrI4r7O5jW 7jtQ== MIME-Version: 1.0 X-Received: by 10.152.205.106 with SMTP id lf10mr9945596lac.21.1400210051131; Thu, 15 May 2014 20:14:11 -0700 (PDT) Received: by 10.112.47.229 with HTTP; Thu, 15 May 2014 20:14:11 -0700 (PDT) Date: Thu, 15 May 2014 20:14:11 -0700 Message-ID: Subject: Data modeling for Pinterest-like application From: ziju feng To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11349908b56d4804f97bce8f X-Virus-Checked: Checked by ClamAV on apache.org --001a11349908b56d4804f97bce8f Content-Type: text/plain; charset=UTF-8 Hello, I'm working on data modeling for a Pinterest-like project. There are basically two main concepts: Pin and Board, just like Pinterest, where pin is an item containing an image, description and some other information such as a like count, and each board should contain a sorted list of Pins. The board can be modeled with primary key (board_id, created_at, pin_id) where created_at is used to sort the pins of the board by date. The problem is whether I should denormalize details of pins into the board table or just retrieve pins by page (page size can be 10~20) and then multi-get by pin_ids to obtain details. Since there are some boards that are accessed very often (like the home board), denormalization seems to be a reasonable choice to enhance read performance. However, we then have to update not only the pin table be also each row in the board table that contains the pin whenever a pin is updated, which sometimes could be quite frequent (such as updating the like count). Since a pin may be contained by many boards (could be thousands), denormalization seems to bring a lot of load on the write side as well as application code complexity. Any suggestion to whether our data model should go denormalized or the normalized/multi-get way which then perhaps need a separate cached layer for read? Thanks, Ziju --001a11349908b56d4804f97bce8f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hel= lo,

<= div class=3D"gmail_default" style=3D"font-size:small">I'm working on da= ta modeling for a Pinterest-like project. There are basically two main conc= epts: Pin and Board, just like Pinterest, where pin is an item containing a= n image, description and some other information such as a like count, and e= ach board should contain a sorted list of Pins.=C2=A0

The board can be modeled with = primary key (board_id, created_at, pin_id) where created_at is used to sort= the pins of the board by date. The problem is whether I should denormalize= details of pins into the board table or just retrieve pins by page (page s= ize can be 10~20) and then multi-get by pin_ids to obtain details.

Since there are some boards th= at are accessed very often (like the home board), denormalization seems to = be a reasonable choice to enhance read performance. However, we then have t= o update not only the pin table be also each row in the board table that co= ntains the pin whenever a pin is updated, which sometimes could be quite fr= equent (such as updating the like count). Since a pin may be contained by m= any boards (could be thousands), denormalization seems to bring a lot of lo= ad on the write side as well as application code complexity.=C2=A0

Any suggestion to wh= ether our data model should go denormalized or the normalized/multi-get way= which then perhaps need a separate cached layer for read?

Thanks,

Ziju
--001a11349908b56d4804f97bce8f--