Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 33230 invoked from network); 13 Jun 2010 11:43:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Jun 2010 11:43:34 -0000 Received: (qmail 39121 invoked by uid 500); 13 Jun 2010 11:43:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38916 invoked by uid 500); 13 Jun 2010 11:43:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38908 invoked by uid 99); 13 Jun 2010 11:43:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Jun 2010 11:43:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of th.heller@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Jun 2010 11:43:20 +0000 Received: by gwj16 with SMTP id 16so1955279gwj.31 for ; Sun, 13 Jun 2010 04:42:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=P5gXH8wpnTEgpjt7Fr3hxYkuQzR3ez2y/4anTRes4FY=; b=ZdeKX5+Z3Mmrq5rKLKf5pgeiXdH740jP5R4gc7DXm7f5MUEez6WmarfCiDHrs0qfmO g//bdVcGnAAdWs/1zjyTQSSQpd61MD0p6R7Ye4YXNJscMn05P1hA+N76plyAcW3JW9QX zMI0PXjhwC438I7oRCvxM563KAegC4YIhXghk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=R3Ig/5lITqvjWuChcL0gEG2mMVoIzJFRmPWOujRicewA4YKS8GKlrGV2fJ85M/uOeK 5Q1cgKkZdQX8J3SMOGiR+8pLsRbyLgTclcOpu1tCkg+C+D9nYrKEpAsW/uXPTrbQcBAj MNe8ojdnFq6g/PHB4jj9lYH1Otzttzo2aTqOY= MIME-Version: 1.0 Received: by 10.101.192.24 with SMTP id u24mr3596313anp.181.1276429379787; Sun, 13 Jun 2010 04:42:59 -0700 (PDT) Sender: th.heller@gmail.com Received: by 10.100.41.8 with HTTP; Sun, 13 Jun 2010 04:42:59 -0700 (PDT) In-Reply-To: References: Date: Sun, 13 Jun 2010 13:42:59 +0200 X-Google-Sender-Auth: YIQqCpGM6GECWqPABxGxOOy7H3w Message-ID: Subject: Re: Beginner Assumptions From: Thomas Heller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hey, I'm sorry, I think I didnt make myself clear enough. I'm using cassandra only the store the _results_ (the calculated time series) not the source data. Also using "Beginner Assumptions" as the Subject propably wasnt the best choice since I'm more interested in the inner workings of cassandra than how to use it. ;) > And the per hour counts are stored as json? No, they are stored as byte arrays with a fixed size (96 =3D 24x4byte integ= ers). > =C2=A0cassandra.get("/page/1", Slice("20100612"..."20100613")) I know how to do it in cassandra, I just was comparing it to others. I was interested to know if cassandra.get("/page/1", :start =3D> "20100612", :count =3D> 90) is actually just as fast as cassandra.get("/page/1", Slice("20100612", "20100613", ...)) with 90 keys > >> Assumption #3: > I doubt you data will grow at a fixed rate per row. (Unless you have > always the same hit pattern for your pages) But you should be able to > able to calculated the maximal required storage requirement. That said > - I am wondering... where are you aggregating the counts per hour? The Data is currently just stored in logfiles which are parsed once an hour in a map/reduce like fashion (not stored in cassandra). Even if there are no values to be saved there will still be a column for this row with [0, 0, 0, ...]. I also do not need to increment any of those counters live. Hit Patterns dont matter since 1million views per hour consume just the same space as 0 views (96 bytes fixed). I may at some time remove the 0 values to save space but right now there is always one column per day per row. > > So you want to increment those counters per hit? I don't think there > is an atomic increment semantic in cassandra yet. (Some one else to > confirm?) No, see above. Each View generates one entry in a logfile which is append only (much like the cassandra commitlog). Incrementing those counters live is very unlikely to happen, since they are just one part of the whole log map/reduce thing. The offline processing part is not moving into cassandra anytime soon, I just wanna put the results somewhere. SQL is fine for that (atm) but I was interested in some NoSQL and this seemed like a good usecase (very structed data, only accessed by keys or key ranges but the key is always known, aka no dynamic queries) Cheers, /thomas