Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 171FED1D2 for ; Fri, 21 Dec 2012 11:50:30 +0000 (UTC) Received: (qmail 94747 invoked by uid 500); 21 Dec 2012 11:50:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94663 invoked by uid 500); 21 Dec 2012 11:50:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94640 invoked by uid 99); 21 Dec 2012 11:50:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 11:50:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.47] (HELO mail-bk0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 11:50:18 +0000 Received: by mail-bk0-f47.google.com with SMTP id j4so2277864bkw.6 for ; Fri, 21 Dec 2012 03:49:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:message-id:subject:x-mailer:mime-version :content-type:x-gm-message-state; bh=NyzheagvGIyUzrEkw6TwOxFBVC/PlJBqDU4drEqB2Eg=; b=BjSAVGp2u+Wik4vYdHExISXXUUP5HiRAeE9LS0KkwxE7YgR4XqZxYXIvTipbo+K1c/ 8hiDtxI6d2S8NlgEXFRzIeKiZRqmMRUQka3UT2SyzCtBJTskDzlXsn+pnib4XD1xvRFp VFH+tQmBcnmgKJOOxsrjxUTgpIEfJUoOLAq7gTH9uHwXLvTIlAQsPoTUBUE6Oew/UsMD TdB46sKzW7l8x/px1GD83j/iIzj0wKcP5QKP2D02dR5opVak8ZedVE+oo15Uy5bJxrUD YW231w8Plnx551Pouiqpe0BKsS3ydQ6W7kYaY8Bi+02EgUAg7ntWv9mBpKKp51lWjKiU vH0w== X-Received: by 10.204.147.216 with SMTP id m24mr6506930bkv.63.1356090598089; Fri, 21 Dec 2012 03:49:58 -0800 (PST) Received: from [192.168.0.4] (109.125.120.225.dynamic.cablesurf.de. [109.125.120.225]) by mx.google.com with ESMTPS id f24sm9640421bkv.7.2012.12.21.03.49.56 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 21 Dec 2012 03:49:57 -0800 (PST) Date: Fri, 21 Dec 2012 12:49:55 +0100 From: David Mohl To: user@cassandra.apache.org Message-ID: <15D7F5AD4CB04DAA9E828F9F5367AF4B@dave.cx> Subject: Correct way to design a cassandra database X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="50d44ce3_1eba5d23_4c3" X-Gm-Message-State: ALoCoQl0VaII1rXZlOLYzSkmvcw0LAH0/2fijcWn5srkkC74yWd8zQUsbed2cONTzI1mViJql+7x X-Virus-Checked: Checked by ClamAV on apache.org --50d44ce3_1eba5d23_4c3 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello! I've recently started learning cassandra but still have troubles understanding the best way to design a cassandra database. I've posted my question already on stackoverflow but because this would very likely result in a discussion, it got closed. Orginal question here: http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database Assuming you have 3 types of objects: User, Photo and Album. Obviously a photo belongs to a user and can be part of a album. For querying, assume we just want to order by "last goes first". Paging by 10 elements should be possible. Would you go like every document has all the informations needed for a correct output. Something like this: -- User | -- Name | -- ... | -- Photos | -- Photoname | -- Uploaded at Or go a more relational way (while having a secondary index on the "belongs_to" columns: -- User (userid is the row key) | -- Name | -- ... -- Photoid | -- belongs_to (userid) | -- belongs_to_album (albumid) | -- ... -- Albumid | -- belongs_to (userid) | -- ... Another way that came in my mind would be kind of a mix: -- User | -- Name | -- ... | -- Photoids (e.g. 1,2,3,4,5) | -- Albumids (e.g. 1,2,3,4,5) -- Photoid (photoid is the row key) | -- Name | -- Uploaded at | -- ... -- Albumid (albumid is the row key) | -- Name | -- Photoids (e.g. 1,2,3,4,5) | -- ... When using a random partitioner, the last example would be (IMO) the way to go. I can query the user object (out of a session id or something) and would get all the row keys I need for fetching photo / album data. However this would result in veeery large columns. Another down point would be inconsistency and identification problems. A photo (or a album) could not be identified by the row itself. Example: If I fetch a photo with ID 3456, I don't know in which albums it is part nor which user owns it. Adding this kind of information would result in a fairly large stack of points I have to alter on creation / update. The second example has all the information needed. However, if I want to fetch all photos that are part of album x, I have to query by a secondary index that COULD contain millions of entries over the whole cluster. And I guess I can forget the random partitioner on this example. Am I thinking to relational? It'd be great to hear some other opinions on this topic --- David --50d44ce3_1eba5d23_4c3 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
Hello=21

I've recently = started learning cassandra but still have troubles understanding the best= way to design a cassandra database. 
I've posted my quest= ion already on stackoverflow but because this would very likely result in= a discussion, it got closed. Orginal question here: http://stackoverfl= ow.com/questions/13975868/correct-way-to-design-a-cassandra-database<= /div>


Assuming you have 3 types of= objects: User, Photo and Album. Obviously a photo belongs to a user and = can be part of a album. =46or querying, assume we just want to order by =22= last goes first=22. Paging by 10 elements should be possible.
<= br>
Would you go like every document has all the informations n= eeded for a correct output. Something like this:

    -- User 
       =7C -- = Name
       =7C -- ...
  &nb= sp;    =7C -- Photos
        &nbs= p;   =7C -- Photoname
          &= nbsp; =7C -- Uploaded at

Or go a more relational= way (while having a secondary index on the =22belongs=5Fto=22 columns:

    -- User (userid is the row key)&nbs= p;
       =7C -- Name
  &nbs= p;    =7C -- ...

    -- Phot= oid
       =7C -- belongs=5Fto (userid)
       =7C -- belongs=5Fto=5Falbum (albumid)
       =7C -- ...

&= nbsp;   -- Albumid
       =7C -- belon= gs=5Fto (userid)
       =7C -- ...

Another way that came in my mind would be kind of a mix:=

    -- User 
  &n= bsp;    =7C -- Name
       =7C --= ...
       =7C -- Photoids (e.g. 1,2,3,4,5= )
       =7C -- Albumids (e.g. 1,2,3,4,5)

    -- Photoid (photoid is the row key)=  
       =7C -- Name
  =      =7C -- Uploaded at
      &nb= sp;=7C -- ...

    -- Albumid (albumid = is the row key) 
       =7C -- Name
       =7C -- Photoids (e.g. 1,2,3,4,5)
=
       =7C -- ...

When = using a random partitioner, the last example would be (IMO) the way to go= . I can query the user object (out of a session id or something) and woul= d get all the row keys I need for fetching photo / album data. However th= is would result in veeery large columns. Another down point would be inco= nsistency and identification problems. A photo (or a album) could not be = identified by the row itself. 
Example: If I fetch a= photo with ID 3456, I don't know in which albums it is part nor which us= er owns it. Adding this kind of information would result in a fairly larg= e stack of points I have to alter on creation / update.

The second example has all the information needed. However, if I = want to fetch all photos that are part of album x, I have to query by a s= econdary index that COULD contain millions of entries over the whole clus= ter. And I guess I can forget the random partitioner on this example.

Am I thinking to relational=3F 
It'd= be great to hear some other opinions on this topic

<= /div>
---
David

--50d44ce3_1eba5d23_4c3--