From user-return-30740-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Dec 21 15:16:16 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6BC8999AD for ; Fri, 21 Dec 2012 15:16:16 +0000 (UTC) Received: (qmail 44266 invoked by uid 500); 21 Dec 2012 15:16:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44254 invoked by uid 500); 21 Dec 2012 15:16:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44238 invoked by uid 99); 21 Dec 2012 15:16:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 15:16:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aventurella@gmail.com designates 209.85.219.54 as permitted sender) Received: from [209.85.219.54] (HELO mail-oa0-f54.google.com) (209.85.219.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 15:16:07 +0000 Received: by mail-oa0-f54.google.com with SMTP id n9so4771638oag.13 for ; Fri, 21 Dec 2012 07:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:from:in-reply-to:mime-version:date:message-id:subject:to :content-type; bh=AhAdId+SC1k8EleADgH3EM6flTmuBVpsyQbvJggjmII=; b=By17XmEGHJwoc5lVnf7mUVeUkta2Bn7GTMqOlMDEsj1TrL0edJM1MqkhHK+gtAbjGx h6F6n6jG0h3bmJ2WHoh6MlQ8yYecb1jA9unKBYf0LPGlZS3mXEsQLG1I+UD/mAsGuuNQ fDHGyGnMBDyBNlH9FEWH18XQNoDFoHclTA12RQP9pvLLrMLHs2Zv3AfhQeHo8De09hHq ZxvRHQMQg6SYH8eVJqNd+hZiuf4Hxwl/eU4ZlR10b9yqWLtTzrIJ+WfKSEHq+M5Bvc3+ ZD/7XklUYgTN1dTeHGR94ZrTqmcbg+4X4EHKxYwDm4ZYZyrmtfsWSi/Ut8nKOZLW/IfL JSNA== Received: by 10.60.3.193 with SMTP id e1mr11596702oee.39.1356102946049; Fri, 21 Dec 2012 07:15:46 -0800 (PST) References: <15D7F5AD4CB04DAA9E828F9F5367AF4B@dave.cx> From: Adam Venturella In-Reply-To: <15D7F5AD4CB04DAA9E828F9F5367AF4B@dave.cx> Mime-Version: 1.0 (1.0) Date: Fri, 21 Dec 2012 07:15:44 -0800 Message-ID: <-8847633933343020725@unknownmsgid> Subject: Re: Correct way to design a cassandra database To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=e89a8ff2563e60cdbf04d15e52e2 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff2563e60cdbf04d15e52e2 Content-Type: text/plain; charset=ISO-8859-1 Hmmm it just occurred to me that in my examples, there is no convenient way to delete a photo and also remove that photo from the albums it is a part of. As it stands, you would need to iterate over all of the users albums to locate the photo and remove it; that's no good. Probably need another table that holds just the photo / album identifiers, an index. So when the user deletes a photo, you ask the index which albums that photo belongs too and just fetch those to update the album with that photo removed. :: mobile emails :: On Dec 21, 2012, at 3:50, David Mohl wrote: Hello! I've recently started learning cassandra but still have troubles understanding the best way to design a cassandra database. I've posted my question already on stackoverflow but because this would very likely result in a discussion, it got closed. Orginal question here: http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database Assuming you have 3 types of objects: User, Photo and Album. Obviously a photo belongs to a user and can be part of a album. For querying, assume we just want to order by "last goes first". Paging by 10 elements should be possible. Would you go like every document has all the informations needed for a correct output. Something like this: -- User | -- Name | -- ... | -- Photos | -- Photoname | -- Uploaded at Or go a more relational way (while having a secondary index on the "belongs_to" columns: -- User (userid is the row key) | -- Name | -- ... -- Photoid | -- belongs_to (userid) | -- belongs_to_album (albumid) | -- ... -- Albumid | -- belongs_to (userid) | -- ... Another way that came in my mind would be kind of a mix: -- User | -- Name | -- ... | -- Photoids (e.g. 1,2,3,4,5) | -- Albumids (e.g. 1,2,3,4,5) -- Photoid (photoid is the row key) | -- Name | -- Uploaded at | -- ... -- Albumid (albumid is the row key) | -- Name | -- Photoids (e.g. 1,2,3,4,5) | -- ... When using a random partitioner, the last example would be (IMO) the way to go. I can query the user object (out of a session id or something) and would get all the row keys I need for fetching photo / album data. However this would result in veeery large columns. Another down point would be inconsistency and identification problems. A photo (or a album) could not be identified by the row itself. Example: If I fetch a photo with ID 3456, I don't know in which albums it is part nor which user owns it. Adding this kind of information would result in a fairly large stack of points I have to alter on creation / update. The second example has all the information needed. However, if I want to fetch all photos that are part of album x, I have to query by a secondary index that COULD contain millions of entries over the whole cluster. And I guess I can forget the random partitioner on this example. Am I thinking to relational? It'd be great to hear some other opinions on this topic --- David --e89a8ff2563e60cdbf04d15e52e2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hmmm it just occurred to me that i= n my examples, there is no convenient way to delete a photo and also remove= that photo from the albums it is a part of.

As it stands, you would need to iterate over all of the= users albums to locate the photo and remove it; that's no good.=A0

Probably need another table that holds just the photo= / album identifiers, an index. So when the user deletes a photo, you ask t= he index which albums that photo belongs too and just fetch those to update= the album with that photo removed.

:: mobile emails ::

On Dec 21, 2012, at 3:50, David Mohl = <dave@dave.cx> wrote:

Hello!

I've recently = started learning cassandra but still have troubles understanding the best w= ay to design a cassandra database.=A0
I've posted my question= already on stackoverflow but because this would very likely result in a di= scussion, it got closed. Orginal question here:=A0http://stackoverflow.com/questions/13975868/c= orrect-way-to-design-a-cassandra-database


Assuming you have 3 types of object= s: User, Photo and Album. Obviously a photo belongs to a user and can be pa= rt of a album. For querying, assume we just want to order by "last goe= s first". Paging by 10 elements should be possible.

Would you go like every document has all the informatio= ns needed for a correct output. Something like this:

=A0 =A0 -- User=A0
=A0 =A0 =A0 =A0| -- Name
=A0 =A0 = =A0 =A0| -- ...
=A0 =A0 =A0 =A0| -- Photos
=A0 =A0 =A0 =A0 =A0 =A0 | -- Phot= oname
=A0 =A0 =A0 =A0 =A0 =A0 | -- Uploaded at

Or go a more relational way (while having a secondary index on the &= quot;belongs_to" columns:

=A0 =A0 -- User (userid is the row key)=A0
= =A0 =A0 =A0 =A0| -- Name
=A0 =A0 =A0 =A0| -- ...

=A0 =A0 -- Photoid
=A0 =A0 =A0 =A0| -- belongs_to (useri= d)
=A0 =A0 =A0 =A0| -- belongs_to_album (albumid)
=A0 =A0 =A0 =A0| -- ...

=A0 =A0 -- Albumid
=A0 =A0 =A0 =A0| -- belongs_to (userid)
=A0 =A0 =A0 =A0| = -- ...

Another way that came in my mind would be k= ind of a mix:

=A0 =A0 -- User=A0
=A0 =A0 =A0 =A0| -- Name
<= div>=A0 =A0 =A0 =A0| -- ...
=A0 =A0 =A0 =A0| -- Photoids (e.g. 1,= 2,3,4,5)
=A0 =A0 =A0 =A0| -- Albumids (e.g. 1,2,3,4,5)
=
=A0 =A0 -- Photoid (photoid is the row key)=A0
=A0 =A0 =A0 =A0| -- Name
=A0 =A0 =A0 =A0| -- Uploaded at
=A0 =A0 =A0 =A0| -- ...

=A0 =A0 -- Albumid = (albumid is the row key)=A0
=A0 =A0 =A0 =A0| -- Name
= =A0 =A0 =A0 =A0| -- Photoids (e.g. 1,2,3,4,5)
=A0 =A0 =A0 =A0| -- ...

When using a random p= artitioner, the last example would be (IMO) the way to go. I can query the = user object (out of a session id or something) and would get all the row ke= ys I need for fetching photo / album data. However this would result in vee= ery large columns. Another down point would be inconsistency and identifica= tion problems. A photo (or a album) could not be identified by the row itse= lf.=A0
Example: If I fetch a photo with ID 3456, I don't know in wh= ich albums it is part nor which user owns it. Adding this kind of informati= on would result in a fairly large stack of points I have to alter on creati= on / update.

The second example has all the information needed. Howe= ver, if I want to fetch all photos that are part of album x, I have to quer= y by a secondary index that COULD contain millions of entries over the whol= e cluster. And I guess I can forget the random partitioner on this example.=

Am I thinking to relational?=A0
It'd be g= reat to hear some other opinions on this topic

---
David

--e89a8ff2563e60cdbf04d15e52e2--