incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Venturella <aventure...@gmail.com>
Subject Re: Correct way to design a cassandra database
Date Fri, 21 Dec 2012 15:15:44 GMT
Hmmm it just occurred to me that in my examples, there is no convenient way
to delete a photo and also remove that photo from the albums it is a part
of.

As it stands, you would need to iterate over all of the users albums to
locate the photo and remove it; that's no good.

Probably need another table that holds just the photo / album identifiers,
an index. So when the user deletes a photo, you ask the index which albums
that photo belongs too and just fetch those to update the album with that
photo removed.

:: mobile emails ::

On Dec 21, 2012, at 3:50, David Mohl <dave@dave.cx> wrote:

 Hello!

I've recently started learning cassandra but still have troubles
understanding the best way to design a cassandra database.
I've posted my question already on stackoverflow but because this would
very likely result in a discussion, it got closed. Orginal question here:
http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


Assuming you have 3 types of objects: User, Photo and Album. Obviously a
photo belongs to a user and can be part of a album. For querying, assume we
just want to order by "last goes first". Paging by 10 elements should be
possible.

Would you go like every document has all the informations needed for a
correct output. Something like this:

    -- User
       | -- Name
       | -- ...
       | -- Photos
            | -- Photoname
            | -- Uploaded at

Or go a more relational way (while having a secondary index on the
"belongs_to" columns:

    -- User (userid is the row key)
       | -- Name
       | -- ...

    -- Photoid
       | -- belongs_to (userid)
       | -- belongs_to_album (albumid)
       | -- ...

    -- Albumid
       | -- belongs_to (userid)
       | -- ...

Another way that came in my mind would be kind of a mix:

    -- User
       | -- Name
       | -- ...
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- Albumids (e.g. 1,2,3,4,5)

    -- Photoid (photoid is the row key)
       | -- Name
       | -- Uploaded at
       | -- ...

    -- Albumid (albumid is the row key)
       | -- Name
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- ...

When using a random partitioner, the last example would be (IMO) the way to
go. I can query the user object (out of a session id or something) and
would get all the row keys I need for fetching photo / album data. However
this would result in veeery large columns. Another down point would be
inconsistency and identification problems. A photo (or a album) could not
be identified by the row itself.
Example: If I fetch a photo with ID 3456, I don't know in which albums it
is part nor which user owns it. Adding this kind of information would
result in a fairly large stack of points I have to alter on creation /
update.

The second example has all the information needed. However, if I want to
fetch all photos that are part of album x, I have to query by a secondary
index that COULD contain millions of entries over the whole cluster. And I
guess I can forget the random partitioner on this example.

Am I thinking to relational?
It'd be great to hear some other opinions on this topic

---
David

Mime
View raw message