incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Correct way to design a cassandra database
Date Fri, 21 Dec 2012 13:07:08 GMT
I you have a way to partition tables, relational can be ok.  Thing of a business that has trillions
of clients as customers and clients have a whole slew of things they are related to.  Partitioning
by client can be a good way to go.  Here are some patterns we have seen in nosql and perhaps
they can help your situation….

https://github.com/deanhiller/playorm/wiki/Patterns-Page

Later,
Dean

From: David Mohl <dave@dave.cx<mailto:dave@dave.cx>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, December 21, 2012 4:49 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Correct way to design a cassandra database

Hello!

I've recently started learning cassandra but still have troubles understanding the best way
to design a cassandra database.
I've posted my question already on stackoverflow but because this would very likely result
in a discussion, it got closed. Orginal question here: http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


Assuming you have 3 types of objects: User, Photo and Album. Obviously a photo belongs to
a user and can be part of a album. For querying, assume we just want to order by "last goes
first". Paging by 10 elements should be possible.

Would you go like every document has all the informations needed for a correct output. Something
like this:

    -- User
       | -- Name
       | -- ...
       | -- Photos
            | -- Photoname
            | -- Uploaded at

Or go a more relational way (while having a secondary index on the "belongs_to" columns:

    -- User (userid is the row key)
       | -- Name
       | -- ...

    -- Photoid
       | -- belongs_to (userid)
       | -- belongs_to_album (albumid)
       | -- ...

    -- Albumid
       | -- belongs_to (userid)
       | -- ...

Another way that came in my mind would be kind of a mix:

    -- User
       | -- Name
       | -- ...
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- Albumids (e.g. 1,2,3,4,5)

    -- Photoid (photoid is the row key)
       | -- Name
       | -- Uploaded at
       | -- ...

    -- Albumid (albumid is the row key)
       | -- Name
       | -- Photoids (e.g. 1,2,3,4,5)
       | -- ...

When using a random partitioner, the last example would be (IMO) the way to go. I can query
the user object (out of a session id or something) and would get all the row keys I need for
fetching photo / album data. However this would result in veeery large columns. Another down
point would be inconsistency and identification problems. A photo (or a album) could not be
identified by the row itself.
Example: If I fetch a photo with ID 3456, I don't know in which albums it is part nor which
user owns it. Adding this kind of information would result in a fairly large stack of points
I have to alter on creation / update.

The second example has all the information needed. However, if I want to fetch all photos
that are part of album x, I have to query by a secondary index that COULD contain millions
of entries over the whole cluster. And I guess I can forget the random partitioner on this
example.

Am I thinking to relational?
It'd be great to hear some other opinions on this topic

---
David


Mime
View raw message