cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Weaver <>
Subject Re: schema example
Date Sat, 04 Jul 2009 01:53:32 GMT
(From talking on IRC):

I think this boils down to the offset/limit vs. token/limit debate.

Token/limit is fine in all cases for me, but you still have to be able
to query the head of the list (with a limit, but no token) to get
started. Right now there is no facility for that on time-sorted column

  list<column_t> get_columns_since(1:string tablename, 2:string key,
3:string columnParent, 4:i64 timeStamp)

I don't think token ranges are supported on time columns, either.

Also, to be optimally useable, you need to be able to begin a
token-based pagination system from either the head or tail of the
list, but that may not be possible with sstables.

It may just be an oversight...the API is confusingly organized, and
it's hard to be sure if some likely feature is there or not.



On Fri, Jul 3, 2009 at 6:06 PM, Evan Weaver<> wrote:
> That requires you to know the timestamp, so you can't just ask for the
> most recent one.
> Evan
> On Fri, Jul 3, 2009 at 6:02 PM, Jonathan Ellis<> wrote:
>> get_columns_since
>> On Fri, Jul 3, 2009 at 7:21 PM, Evan Weaver<> wrote:
>>> This helps a lot.
>>> However, I can't find any API method that actually lets me do a
>>> slice query on a time-sorted column, as necessary for the second blog
>>> example. I get the following error on r789419:
>>> InvalidRequestException: get_slice_from requires CF indexed by name
>>> Evan
>>> On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis<> wrote:
>>>> Mail storage, man, I think pretty much anything I could come up with
>>>> would look pretty simplistic compared to what "real" systems do in
>>>> that domain. :)
>>>> But blogs, I think I can handle those.  Let's make it ours multiuser
>>>> or there isn't enough scale to make it interesting. :)
>>>> The interesting thing here is we want to be able to query two things
>>>> efficiently:
>>>>  - the most recent posts belonging to a given blog, in reverse
>>>> chronological order
>>>>  - a single post and its comments, in chronological order
>>>> At first glance you might think we can again reasonably do this with a
>>>> single CF, this time a super CF:
>>>> <ColumnFamily ColumnType="Super" ColumnSort="Time" Name="Post"/>
>>>> The key is the blog name, the supercolumns are posts and the
>>>> subcolumns are comments.  This would be reasonable BUT supercolumns
>>>> are just containers, they have no data or timestamp associated with
>>>> them directly (only through their subcolumns).  So you cannot sort a
>>>> super CF by time.
>>>> So instead what I would do would be to use two CFs:
>>>> <ColumnFamily ColumnSort="Time" Name="Post"/>
>>>> <ColumnFamily ColumnSort="Time" Name="Comment"/>
>>>> For the first, the keys used would be blog names, and the columns
>>>> would be the post titles and body.  So to get a list of most recent
>>>> posts you just do a slice query.  Even though Cassandra currently
>>>> handles large groups of columns sub-optimally, even with a blog
>>>> updated several times a day you'd be safe taking this approach (i.e.
>>>> we'll have that problem fixed before you start seeing it :).
>>>> For the second, the keys are blog name<delimiter><post title>.
>>>> columns are the comment data.  You can serialize these a number of
>>>> ways; I would probably use title as the column name and have the value
>>>> be the author + body (e.g. as a json dict).  Again we use the slice
>>>> call to get the comments in order.  (We will have to manually reverse
>>>> what slice gives us since time sort is always reverse chronological
>>>> atm, but the overhead of doing this in memory will be negligible.)
>>>> Does this help?
>>>> -Jonathan
>>>> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver <> wrote:
>>>>> Even if it's not actually in real-life use, some examples for common
>>>>> domains would really help clarify things.
>>>>>  * blog
>>>>>  * email storage
>>>>>  * search index
>>>>> etc.
>>>>> Evan
>>>>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis <>
>>>>>> Does anyone have a simple app schema they can share?
>>>>>> I can't share the one for our main app.  But we do need an example
>>>>>> here.  A real one would be nice if we can find one.
>>>>>> I checked App Engine.  They don't have a whole lot of examples either.
>>>>>>  They do have a really simple one:
>>>>>> The most important thing in Cassandra modeling is choosing a good
>>>>>> since that is what most of your lookups will be by.  Keys are also
>>>>>> Cassandra scales -- Cassandra can handle effectively infinite keys
>>>>>> (given enough nodes obviously) but only thousands to millions of
>>>>>> columns per key/CF (depending on what API calls you use -- Jun is
>>>>>> adding one now that does not deseriailze everything in the whole
>>>>>> into memory.  The rest will need to follow this model eventually
>>>>>> For this guestbook I think the choice is obvious: use the name as
>>>>>> key, and have a single simple CF for the messages.  Each column
>>>>>> be a message (you can even use the mandatory timestamp field as part
>>>>>> of your user-visible data.  win!).  You get the list (or page)
>>>>>> users with get_key_range and then their messages with get_slice.
>>>>>> <ColumnFamily ColumnSort="Name" Name="Message"/>
>>>>>> Anyone got another one for pedagogical purposes?
>>>>>> -Jonathan
>>>>> --
>>>>> Evan Weaver
>>> --
>>> Evan Weaver
> --
> Evan Weaver

Evan Weaver

View raw message