cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Data Model Review
Date Wed, 19 Dec 2012 04:13:33 GMT
> I have heard it best to try and avoid the use of super columns for now. 

Your model makes sense. If you are creating the CF using the cassandra-cli you will probably
want to reverse order the column names see

If you want to use CQL 3 you could do something like this:

CREATE TABLE InstagramPhotos (

	user_name str,
	photo_seq timestamp,
	meta_1 str, 
	meta_2 str
	PRIMARY KEY (user_name, phot_seq)

That's pretty much the same. user_name is the row key, and photo_seq will be used as part
of a composite column name internally. 
(You can do the same thing without CQL, just look up composite columns)

You can do something similar for the annotations. 

Depending on your use case I would use UNIX epoch time if possible rather than a time uuid.

Hope that helps. 

Aaron Morton
Freelance Cassandra Developer
New Zealand


On 18/12/2012, at 4:35 AM, Adam Venturella <> wrote:

> My use case is capturing some information about Instagram photos from the API. I have
2 use cases. One, I need to capture all of the media data for an account and two I need to
be able to privately annotate that data. There is some nuance in this, multiple http queries
for example, but ignoring that, and assuming I have obtained all of the data surrounding an
accounts photos here is how I was thinking of storing that information for use case 1. 
> ColumnFamily: InstagramPhotos
> Row Key: <account_username>
> Columns:   
> Coulmn Name: <date_posted_timestamp>
> Coulumn Value: JSON representing the data for the individual photo (filter, comments,
likes etc, not the binary photo data).
> So the idea would be to keep adding columns to the row that contain that serialized data
(in JSON) with their timestamps as the name.  Timestamps as the column names, I figure, should
help help to perform range queries, where I make the 1st column inserted the earliest timestamp
and the last column inserted the most recent. I could probably also use TimeUUIDs here as
well since I will have things ordered prior to inserting.
> The question here, does this approach make sense? Is it common to store JSON in columns
like this? I know there are super columns as well, so I could use those I suppose instead
of JSON. The extra level of indexing would probably be useful to query specific photos for
use case 2. I have heard it best to try and avoid the use of super columns for now. I have
no information to back that claim up other than some time spent in the IRC. So feel free to
debunk that statement if it is false.
> So that is use case one, use case two covers the private annotations.
> I figured here:
> ColumnFamily: InstagramAnnotations
> row key:  Canonical Media Id
> Column Name: TimeUUID
> Column Value: JSON representing an annotation/internal comment
> Writing out the above I can actually see where I might need to tighten some things up
around how I store the photos. I am clearly missing an obvious connection between the InstagramPhotos
and the InstagramAnnotations, maybe super columns would help with the photos instead of JSON?
Otherwise I would need to build an index row where I tie the the canonical photo id to a timestamp
(column name) in the InstagramPhotos. I could also try to figure out how to make a TimeUUID
of my own that can double as the media's canonical id or further look at Instagram's canonical
id for photos and see if it already counts up. In which case I could use that in place of
a timestamp.
> Anyway, I figured I would see if anyone might help flush out other potential pitfalls
in the above. I am definitely new to cassandra and I am using this project as a way to learn
some more about assembling systems using it.

View raw message