Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 975 invoked from network); 4 Jul 2009 01:54:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jul 2009 01:54:12 -0000 Received: (qmail 85602 invoked by uid 500); 4 Jul 2009 01:54:22 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 85559 invoked by uid 500); 4 Jul 2009 01:54:22 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 85550 invoked by uid 99); 4 Jul 2009 01:54:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 01:54:22 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eweaver@gmail.com designates 209.85.210.191 as permitted sender) Received: from [209.85.210.191] (HELO mail-yx0-f191.google.com) (209.85.210.191) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 01:54:13 +0000 Received: by yxe29 with SMTP id 29so3833396yxe.32 for ; Fri, 03 Jul 2009 18:53:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=Ss7uYJjG3L9wmdxJnr7fYYaWQj35YY9FICevSUN2bGE=; b=RUMdP2UHVyGbR2sqj9pvKlPez/czPV21v82ZATflP6LpnQf07oi3XEkDM7+ppCzqi7 wu8IQaWxD6OflKK0u33xsPqWQXVecAdG78iP6pM1w/hLY+DNXgiG39H5Z4goC9RkvlJQ pdQvZGCqhDX3u5CKXA40Uie41kqW89pNFXvrs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=pYPNYQftW/3y2+0uLrO8HMCBzmc8a3aYNIL1di8lB7vJEhqWQtOKWDUYQmyn7odTIZ CGtHkR9t6Ya6DGFV2mf9l6iv9QzeMMc0Nbsu6mGKaJzi+2gJB5ibyePExMXY9kklZR2o A4Wzje6nVKZhbIoa+tbzFctvfZdsxKkzVvFWY= MIME-Version: 1.0 Received: by 10.100.163.15 with SMTP id l15mr3386216ane.22.1246672432345; Fri, 03 Jul 2009 18:53:52 -0700 (PDT) In-Reply-To: References: From: Evan Weaver Date: Fri, 3 Jul 2009 18:53:32 -0700 Message-ID: Subject: Re: schema example To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org (From talking on IRC): I think this boils down to the offset/limit vs. token/limit debate. Token/limit is fine in all cases for me, but you still have to be able to query the head of the list (with a limit, but no token) to get started. Right now there is no facility for that on time-sorted column families: list get_columns_since(1:string tablename, 2:string key, 3:string columnParent, 4:i64 timeStamp) I don't think token ranges are supported on time columns, either. Also, to be optimally useable, you need to be able to begin a token-based pagination system from either the head or tail of the list, but that may not be possible with sstables. It may just be an oversight...the API is confusingly organized, and it's hard to be sure if some likely feature is there or not. Related: http://issues.apache.org/jira/browse/CASSANDRA-261 http://issues.apache.org/jira/browse/CASSANDRA-217 http://issues.apache.org/jira/browse/CASSANDRA-263 Evan On Fri, Jul 3, 2009 at 6:06 PM, Evan Weaver wrote: > That requires you to know the timestamp, so you can't just ask for the > most recent one. > > Evan > > On Fri, Jul 3, 2009 at 6:02 PM, Jonathan Ellis wrote: >> get_columns_since >> >> On Fri, Jul 3, 2009 at 7:21 PM, Evan Weaver wrote: >>> This helps a lot. >>> >>> However, I can't find any API method that actually lets me do a >>> slice query on a time-sorted column, as necessary for the second blog >>> example. I get the following error on r789419: >>> >>> InvalidRequestException: get_slice_from requires CF indexed by name >>> >>> Evan >>> >>> On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis wrot= e: >>>> Mail storage, man, I think pretty much anything I could come up with >>>> would look pretty simplistic compared to what "real" systems do in >>>> that domain. :) >>>> >>>> But blogs, I think I can handle those. =A0Let's make it ours multiuser >>>> or there isn't enough scale to make it interesting. :) >>>> >>>> The interesting thing here is we want to be able to query two things >>>> efficiently: >>>> =A0- the most recent posts belonging to a given blog, in reverse >>>> chronological order >>>> =A0- a single post and its comments, in chronological order >>>> >>>> At first glance you might think we can again reasonably do this with a >>>> single CF, this time a super CF: >>>> >>>> >>>> >>>> The key is the blog name, the supercolumns are posts and the >>>> subcolumns are comments. =A0This would be reasonable BUT supercolumns >>>> are just containers, they have no data or timestamp associated with >>>> them directly (only through their subcolumns). =A0So you cannot sort a >>>> super CF by time. >>>> >>>> So instead what I would do would be to use two CFs: >>>> >>>> >>>> >>>> >>>> For the first, the keys used would be blog names, and the columns >>>> would be the post titles and body. =A0So to get a list of most recent >>>> posts you just do a slice query. =A0Even though Cassandra currently >>>> handles large groups of columns sub-optimally, even with a blog >>>> updated several times a day you'd be safe taking this approach (i.e. >>>> we'll have that problem fixed before you start seeing it :). >>>> >>>> For the second, the keys are blog name. =A0The >>>> columns are the comment data. =A0You can serialize these a number of >>>> ways; I would probably use title as the column name and have the value >>>> be the author + body (e.g. as a json dict). =A0Again we use the slice >>>> call to get the comments in order. =A0(We will have to manually revers= e >>>> what slice gives us since time sort is always reverse chronological >>>> atm, but the overhead of doing this in memory will be negligible.) >>>> >>>> Does this help? >>>> >>>> -Jonathan >>>> >>>> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver wrote= : >>>>> Even if it's not actually in real-life use, some examples for common >>>>> domains would really help clarify things. >>>>> >>>>> =A0* blog >>>>> =A0* email storage >>>>> =A0* search index >>>>> >>>>> etc. >>>>> >>>>> Evan >>>>> >>>>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis w= rote: >>>>>> Does anyone have a simple app schema they can share? >>>>>> >>>>>> I can't share the one for our main app. =A0But we do need an example >>>>>> here. =A0A real one would be nice if we can find one. >>>>>> >>>>>> I checked App Engine. =A0They don't have a whole lot of examples eit= her. >>>>>> =A0They do have a really simple one: >>>>>> http://code.google.com/appengine/docs/python/gettingstarted/usingdat= astore.html >>>>>> >>>>>> The most important thing in Cassandra modeling is choosing a good ke= y, >>>>>> since that is what most of your lookups will be by. =A0Keys are also= how >>>>>> Cassandra scales -- Cassandra can handle effectively infinite keys >>>>>> (given enough nodes obviously) but only thousands to millions of >>>>>> columns per key/CF (depending on what API calls you use -- Jun is >>>>>> adding one now that does not deseriailze everything in the whole CF >>>>>> into memory. =A0The rest will need to follow this model eventually t= oo). >>>>>> >>>>>> For this guestbook I think the choice is obvious: use the name as th= e >>>>>> key, and have a single simple CF for the messages. =A0Each column wi= ll >>>>>> be a message (you can even use the mandatory timestamp field as part >>>>>> of your user-visible data. =A0win!). =A0You get the list (or page) o= f >>>>>> users with get_key_range and then their messages with get_slice. >>>>>> >>>>>> >>>>>> >>>>>> Anyone got another one for pedagogical purposes? >>>>>> >>>>>> -Jonathan >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Evan Weaver >>>>> >>>> >>> >>> >>> >>> -- >>> Evan Weaver >>> >> > > > > -- > Evan Weaver > --=20 Evan Weaver