incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Venturella <>
Subject Re: Correct way to design a cassandra database
Date Fri, 21 Dec 2012 15:15:44 GMT
Hmmm it just occurred to me that in my examples, there is no convenient way
to delete a photo and also remove that photo from the albums it is a part

As it stands, you would need to iterate over all of the users albums to
locate the photo and remove it; that's no good.

Probably need another table that holds just the photo / album identifiers,
an index. So when the user deletes a photo, you ask the index which albums
that photo belongs too and just fetch those to update the album with that
photo removed.

:: mobile emails ::

On Dec 21, 2012, at 3:50, David Mohl <> wrote:


I've recently started learning cassandra but still have troubles
understanding the best way to design a cassandra database.
I've posted my question already on stackoverflow but because this would
very likely result in a discussion, it got closed. Orginal question here:

Assuming you have 3 types of objects: User, Photo and Album. Obviously a
photo belongs to a user and can be part of a album. For querying, assume we
just want to order by "last goes first". Paging by 10 elements should be

Would you go like every document has all the informations needed for a
correct output. Something like this:

    -- User
       | -- Name
       | -- ...
       | -- Photos
            | -- Photoname
            | -- Uploaded at

Or go a more relational way (while having a secondary index on the
"belongs_to" columns:

    -- User (userid is the row key)
       | -- Name
       | -- ...

    -- Photoid
       | -- belongs_to (userid)
       | -- belongs_to_album (albumid)
       | -- ...

    -- Albumid
       | -- belongs_to (userid)
       | -- ...

Another way that came in my mind would be kind of a mix:

    -- User
       | -- Name
       | -- ...
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- Albumids (e.g. 1,2,3,4,5)

    -- Photoid (photoid is the row key)
       | -- Name
       | -- Uploaded at
       | -- ...

    -- Albumid (albumid is the row key)
       | -- Name
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- ...

When using a random partitioner, the last example would be (IMO) the way to
go. I can query the user object (out of a session id or something) and
would get all the row keys I need for fetching photo / album data. However
this would result in veeery large columns. Another down point would be
inconsistency and identification problems. A photo (or a album) could not
be identified by the row itself.
Example: If I fetch a photo with ID 3456, I don't know in which albums it
is part nor which user owns it. Adding this kind of information would
result in a fairly large stack of points I have to alter on creation /

The second example has all the information needed. However, if I want to
fetch all photos that are part of album x, I have to query by a secondary
index that COULD contain millions of entries over the whole cluster. And I
guess I can forget the random partitioner on this example.

Am I thinking to relational?
It'd be great to hear some other opinions on this topic


View raw message