cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William R Speirs <>
Subject Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?
Date Wed, 02 Feb 2011 16:41:19 GMT
I did not understand before... sorry.

Again, depending upon how many reminders you have for a single user, this could 
be a long/wide row. Again, it really comes down to how many reminders are we 
talking about and how often will they be read/written. While a single row can 
contain millions (maybe more) columns, that doesn't mean it's a good idea.

I'm working on a logging system with Cassandra and ran into this same type of 
problem. Do I put all of the messages for a single system into a single row 
keyed off that system's name? I quickly came to the answer of "no" and now I 
break my row keys into POSIX_timestamp:system where my timestamps are buckets 
for every 5 minutes. This nicely distributes the load across the nodes in my system.


On 02/02/2011 11:18 AM, Aditya Narayan wrote:
> You got me wrong perhaps..
> I am already splitting the row on per user basis ofcourse, otherwise
> the schema wont make sense for my usage. The row contains only
> *reminders of a single user* sorted in chronological order. The
> reminder Id are stored as supercolumn name and subcolumn contain tags
> for that reminder.
> On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs<>  wrote:
>> Any time I see/hear "a single row containing all ..." I get nervous. That
>> single row is going to reside on a single node. That is potentially a lot of
>> load (don't know the system) for that single node. Why wouldn't you split it
>> by at least user? If it won't be a lot of load, then why are you using
>> Cassandra? This seems like something that could easily fit into an
>> SQL/relational style DB. If it's too much data (millions of users, 100s of
>> millions of reminders) for a standard SQL/relational model, then it's
>> probably too much for a single row.
>> I'm not familiar with the TTL functionality of Cassandra... sorry cannot
>> help/comment there, still learning :-)
>> Yea, my $0.02 is that this is an effective way to leverage super columns.
>> Bill-
>> On 02/02/2011 10:43 AM, Aditya Narayan wrote:
>>> I think you got it exactly what I wanted to convey except for few
>>> things I want to clarify:
>>> I was thinking of a single row containing all reminders (&    not split
>>> by day). History of the reminders need to be maintained for some time.
>>> After certain time (say 3 or 6 months) they may be deleted by ttl
>>> facility.
>>> "While presenting the reminders timeline to the user, latest
>>> supercolumns like around 50 from the start_end will be picked up and
>>> their subcolumns values will be compared to the Tags user has chosen
>>> to see and, corresponding to the filtered subcolumn values(tags), the
>>> rows of the reminder details would be picked up.."
>>> Is supercolumn a preferable choice for this ? Can there be a better
>>> schema than this ?
>>> -Aditya Narayan
>>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<>
>>>   wrote:
>>>> To reiterate, so I know we're both on the same page, your schema would be
>>>> something like this:
>>>> - A column family (as you describe) to store the details of a reminder.
>>>> One
>>>> reminder per row. The row key would be a TimeUUID.
>>>> - A super column family to store the reminders for each user, for each
>>>> day.
>>>> The row key would be something like: YYYYMMDD:user_id. The column names
>>>> would simply be the TimeUUID of the messages. The sub column names would
>>>> be
>>>> the tag names of the various reminders.
>>>> The idea is that you would then get a slice of each row for a user, for a
>>>> day, that would only contain sub column names with the tags you're
>>>> looking
>>>> for? Then based upon the column names returned, you'd look-up the
>>>> reminders.
>>>> That seems like a solid schema to me.
>>>> Bill-
>>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote:
>>>>> Actually, I am trying to use Cassandra to display to users on my
>>>>> applicaiton, the list of all Reminders set by themselves for
>>>>> themselves, on the application.
>>>>> I need to store rows containing the timeline of daily Reminders put by
>>>>> the users, for themselves, on application. The reminders need to be
>>>>> presented to the user in a chronological order like a news feed.
>>>>> Each reminder has got certain tags associated with it(so that, at
>>>>> times, user may also choose to see the reminders filtered by tags in
>>>>> chronological order).
>>>>> So I thought of a schema something like this:-
>>>>> -Each Reminder details may be stored as separate rows in column family.
>>>>> -For presenting the timeline of reminders set by user to be presented
>>>>> to the user, the timeline row of each user would contain the Id/Key(s)
>>>>> (of the Reminder rows) as the supercolumn names and the subcolumns
>>>>> inside that supercolumns could contain the list of tags associated
>>>>> with particular reminder. All tags set at once during first write. The
>>>>> no of tags(subcolumns) will be around 8 maximum.
>>>>> Any comments, suggestions and feedback on the schema design are
>>>>> requested..
>>>>> Thanks
>>>>> Aditya Narayan
>>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<>
>>>>>   wrote:
>>>>>> Hey all,
>>>>>> I need to store supercolumns each with around 8 subcolumns;
>>>>>> All the data for a supercolumn is written at once and all subcolumns
>>>>>> need to be retrieved together. The data in each subcolumn is not
>>>>>> it just contains keys to other rows.
>>>>>> Would it be preferred to have a supercolumn family or just a standard
>>>>>> column family containing "all the subcolumns data serialized in single
>>>>>> column(s) " ?
>>>>>> Thanks
>>>>>> Aditya Narayan

View raw message