cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William R Speirs <>
Subject Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?
Date Wed, 02 Feb 2011 15:49:01 GMT
Any time I see/hear "a single row containing all ..." I get nervous. That single 
row is going to reside on a single node. That is potentially a lot of load 
(don't know the system) for that single node. Why wouldn't you split it by at 
least user? If it won't be a lot of load, then why are you using Cassandra? This 
seems like something that could easily fit into an SQL/relational style DB. If 
it's too much data (millions of users, 100s of millions of reminders) for a 
standard SQL/relational model, then it's probably too much for a single row.

I'm not familiar with the TTL functionality of Cassandra... sorry cannot 
help/comment there, still learning :-)

Yea, my $0.02 is that this is an effective way to leverage super columns.


On 02/02/2011 10:43 AM, Aditya Narayan wrote:
> I think you got it exactly what I wanted to convey except for few
> things I want to clarify:
> I was thinking of a single row containing all reminders (&  not split
> by day). History of the reminders need to be maintained for some time.
> After certain time (say 3 or 6 months) they may be deleted by ttl
> facility.
> "While presenting the reminders timeline to the user, latest
> supercolumns like around 50 from the start_end will be picked up and
> their subcolumns values will be compared to the Tags user has chosen
> to see and, corresponding to the filtered subcolumn values(tags), the
> rows of the reminder details would be picked up.."
> Is supercolumn a preferable choice for this ? Can there be a better
> schema than this ?
> -Aditya Narayan
> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<>  wrote:
>> To reiterate, so I know we're both on the same page, your schema would be
>> something like this:
>> - A column family (as you describe) to store the details of a reminder. One
>> reminder per row. The row key would be a TimeUUID.
>> - A super column family to store the reminders for each user, for each day.
>> The row key would be something like: YYYYMMDD:user_id. The column names
>> would simply be the TimeUUID of the messages. The sub column names would be
>> the tag names of the various reminders.
>> The idea is that you would then get a slice of each row for a user, for a
>> day, that would only contain sub column names with the tags you're looking
>> for? Then based upon the column names returned, you'd look-up the reminders.
>> That seems like a solid schema to me.
>> Bill-
>> On 02/02/2011 09:37 AM, Aditya Narayan wrote:
>>> Actually, I am trying to use Cassandra to display to users on my
>>> applicaiton, the list of all Reminders set by themselves for
>>> themselves, on the application.
>>> I need to store rows containing the timeline of daily Reminders put by
>>> the users, for themselves, on application. The reminders need to be
>>> presented to the user in a chronological order like a news feed.
>>> Each reminder has got certain tags associated with it(so that, at
>>> times, user may also choose to see the reminders filtered by tags in
>>> chronological order).
>>> So I thought of a schema something like this:-
>>> -Each Reminder details may be stored as separate rows in column family.
>>> -For presenting the timeline of reminders set by user to be presented
>>> to the user, the timeline row of each user would contain the Id/Key(s)
>>> (of the Reminder rows) as the supercolumn names and the subcolumns
>>> inside that supercolumns could contain the list of tags associated
>>> with particular reminder. All tags set at once during first write. The
>>> no of tags(subcolumns) will be around 8 maximum.
>>> Any comments, suggestions and feedback on the schema design are
>>> requested..
>>> Thanks
>>> Aditya Narayan
>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<>    wrote:
>>>> Hey all,
>>>> I need to store supercolumns each with around 8 subcolumns;
>>>> All the data for a supercolumn is written at once and all subcolumns
>>>> need to be retrieved together. The data in each subcolumn is not big,
>>>> it just contains keys to other rows.
>>>> Would it be preferred to have a supercolumn family or just a standard
>>>> column family containing "all the subcolumns data serialized in single
>>>> column(s) " ?
>>>> Thanks
>>>> Aditya Narayan

View raw message