incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Venturella <aventure...@gmail.com>
Subject Re: Correct way to design a cassandra database
Date Fri, 21 Dec 2012 21:24:25 GMT
One more link that might be helpful. It's a similar system to photo's but
instead of Photos/Albums it's Songs/Playlists:

http://www.datastax.com/dev/blog/cql3-for-cassandra-experts.

It's not exactly 1:1 but it covers related concepts in making it work.



On Fri, Dec 21, 2012 at 8:02 AM, Adam Venturella <aventurella@gmail.com>wrote:

> Ok.. So here is my latest thinking... Including that index:
>
> CREATE TABLE Users (
>     user_name text,
>     password text,
>     PRIMARY KEY (user_name)
> );
>
> ^ Same as before
>
> CREATE TABLE Photos(
>     user_name text,
>     photo_id uuid,
>     created_time timestamp,
>     data text,
>     PRIMARY KEY (user_name, photo_id, created_time)
> ) WITH CLUSTERING ORDER BY (created_time DESC);
>
> ^ Note the addition of a photo id and using that in the PK def with the
> created_time
> Data is a JSON like this:
> {
>     "thumbnail": <url>,
>     "standard_resolution":<url>
> }
>
>
> CREATE TABLE PhotosAlbums (
>     user_name text,
>     album_name text,
>     poster_image_url text,
>     data text
>     PRIMARY KEY (user_name, album_name)
> );
>
> ^ Same as before, data represents a JSON array of the photos:
> [{"photo_id":..., "thumbnail":<url>, "standard_resolution":<url>},
> {"photo_id":..., "thumbnail":<url>, "standard_resolution":<url>},
> {"photo_id":..., "thumbnail":<url>, "standard_resolution":<url>},
> {"photo_id":..., "thumbnail":<url>, "standard_resolution":<url>}]
>
> CREATE TABLE PhotosAlbumsIndex (
>     user_name text,
>     photo_id uuid,
>     album_name text,
>     created_time timestamp
>     PRIMARY KEY (user_name, photo_id, album_name)
> );
>
> The create_time column here is because you need to have at least 1 column
> that is not part of the PK. Or that's what it looks like in my quick test.
>
> ^ Each photo added to an album needs to be added to this index row
>
>
> As before, your application will need to keep the order of the array in
> tact as your users modify the order of things. Now however if they delete a
> photo you need to fetch the PhotoAlbums the photo existed in and update
> them accordingly:
>
> SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
> photo_id="<uuid>"
>
> This should return to you all of the albums that the photo was a part of.
> Now you need to:
>
> SELECT * FROM PhotosAlbums where user_name = "the_user" and album_name IN
> ("name1", "name2", "name3" )
>
> name1,2,3 are the album names you selected from the PhotosAlbumsIndex query
>
> So now you have all of the photo albums, you would then iterate over those
> in your application, deserializing the JSON data locating the photo ID was
> was removed and taking it out of the array, then reserializing to JSON and
> updating the record.
>
> When that is complete you need to remove the Photo from the
> PhotosAlbumsIndex. Now there is where I het stuck a little.. because this
> will fail:
>
> DELETE FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
> photo_id=<uuid>;
>
> It seems to want the album name as well since it's part of the PK.
> Admittedly, I don't know how to get around that and just delete everything
> where the first 2 components of the PK are true.
>
> You would already possess the list of album names though, so it could be
> BATCH that you need to perform for the deletes, specifying 1 delete per
> album_name.
>
>
> Anyway, that's my current thinking. I would love to know if it's possible
> to get around the DELETE issue another way.
>
>
>
> On Fri, Dec 21, 2012 at 7:15 AM, Adam Venturella <aventurella@gmail.com>wrote:
>
>> Hmmm it just occurred to me that in my examples, there is no convenient
>> way to delete a photo and also remove that photo from the albums it is a
>> part of.
>>
>> As it stands, you would need to iterate over all of the users albums to
>> locate the photo and remove it; that's no good.
>>
>> Probably need another table that holds just the photo / album
>> identifiers, an index. So when the user deletes a photo, you ask the index
>> which albums that photo belongs too and just fetch those to update the
>> album with that photo removed.
>>
>> :: mobile emails ::
>>
>> On Dec 21, 2012, at 3:50, David Mohl <dave@dave.cx> wrote:
>>
>>  Hello!
>>
>> I've recently started learning cassandra but still have troubles
>> understanding the best way to design a cassandra database.
>> I've posted my question already on stackoverflow but because this would
>> very likely result in a discussion, it got closed. Orginal question here:
>> http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database
>>
>>
>> Assuming you have 3 types of objects: User, Photo and Album. Obviously a
>> photo belongs to a user and can be part of a album. For querying, assume we
>> just want to order by "last goes first". Paging by 10 elements should be
>> possible.
>>
>> Would you go like every document has all the informations needed for a
>> correct output. Something like this:
>>
>>     -- User
>>        | -- Name
>>        | -- ...
>>        | -- Photos
>>             | -- Photoname
>>             | -- Uploaded at
>>
>> Or go a more relational way (while having a secondary index on the
>> "belongs_to" columns:
>>
>>     -- User (userid is the row key)
>>        | -- Name
>>        | -- ...
>>
>>     -- Photoid
>>        | -- belongs_to (userid)
>>        | -- belongs_to_album (albumid)
>>        | -- ...
>>
>>     -- Albumid
>>        | -- belongs_to (userid)
>>        | -- ...
>>
>> Another way that came in my mind would be kind of a mix:
>>
>>     -- User
>>        | -- Name
>>        | -- ...
>>        | -- Photoids (e.g. 1,2,3,4,5)
>>        | -- Albumids (e.g. 1,2,3,4,5)
>>
>>     -- Photoid (photoid is the row key)
>>        | -- Name
>>        | -- Uploaded at
>>        | -- ...
>>
>>     -- Albumid (albumid is the row key)
>>        | -- Name
>>        | -- Photoids (e.g. 1,2,3,4,5)
>>        | -- ...
>>
>> When using a random partitioner, the last example would be (IMO) the way
>> to go. I can query the user object (out of a session id or something) and
>> would get all the row keys I need for fetching photo / album data. However
>> this would result in veeery large columns. Another down point would be
>> inconsistency and identification problems. A photo (or a album) could not
>> be identified by the row itself.
>> Example: If I fetch a photo with ID 3456, I don't know in which albums it
>> is part nor which user owns it. Adding this kind of information would
>> result in a fairly large stack of points I have to alter on creation /
>> update.
>>
>> The second example has all the information needed. However, if I want to
>> fetch all photos that are part of album x, I have to query by a secondary
>> index that COULD contain millions of entries over the whole cluster. And I
>> guess I can forget the random partitioner on this example.
>>
>> Am I thinking to relational?
>> It'd be great to hear some other opinions on this topic
>>
>> ---
>> David
>>
>>
>

Mime
View raw message