Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 29196 invoked from network); 9 Mar 2010 13:24:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Mar 2010 13:24:14 -0000 Received: (qmail 20225 invoked by uid 500); 9 Mar 2010 13:23:47 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 20205 invoked by uid 500); 9 Mar 2010 13:23:47 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 20197 invoked by uid 99); 9 Mar 2010 13:23:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 13:23:46 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 74.125.82.47 as permitted sender) Received: from [74.125.82.47] (HELO mail-ww0-f47.google.com) (74.125.82.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 13:23:45 +0000 Received: by wwb31 with SMTP id 31so3089562wwb.6 for ; Tue, 09 Mar 2010 05:23:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=yCeeGHhzUPuU3s1slyJf0D61XvpEwETygyxdfFpZvYk=; b=FShuSewWrH5Sv0Nwc4wMTfxg0ofMso7kw7cJwzZhaX/LDfMC4skYBdmcBFbTS94LLo 76ODkZJV+RDXYF0rCSNvMhxBWV/J5aHWsiYG7gfs1c3eldnQUy4Wb+J7fzC4lx950gst +txhvVydokO8Ki0RdRrjIq6Z+Z6NQFQX7sz0o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=sjLjnPUM/KoUUUQUQEXvROtQQ+u/MciJ93L+zakWw9nV0+HDXBY5MCmYtuEoxckwL9 OuHFpKO1POmN+x/N7rHrfLSXh6Nq9fx1LECAFtCNCBL4YQpkeFasWcwYxGgfju7BFJ7f GT2qhHSRPtUJvnupYuXerHWf3Apa/a64TqdJM= MIME-Version: 1.0 Received: by 10.216.87.68 with SMTP id x46mr110501wee.145.1268141002015; Tue, 09 Mar 2010 05:23:22 -0800 (PST) In-Reply-To: <1bca98391003090153w2862ae3fy58cd3b85b038905a@mail.gmail.com> References: <1bca98391003080418q26ff1616o47ea6c7540a6734b@mail.gmail.com> <1bca98391003090153w2862ae3fy58cd3b85b038905a@mail.gmail.com> From: Jonathan Ellis Date: Tue, 9 Mar 2010 07:23:00 -0600 Message-ID: Subject: Re: schema design question To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari w= rote: > Thanks Jonathan. > > Correct if I'm wrong: you are suggesting that each time we receive a new > row (item, [users]) we do 2 operations: > > 1) insert (or merge) this row 'as it is' (item, [users]) > 2) for each user in [users]: insert =A0(user, [item]) > > Each incoming item is liked by 100 users, so it would be 100 db ops per i= tem. > User ids are 20b, so it's about 2k per item sent to the database. Right. > At about 10 items/sec, we are looking at 1k db ops/sec or 20k/sec. > > Can you make a gross estimate of hardware requirements? One quad-core node can handle ~14000 inserts per second so you are in good shape. > We don't know when the like-ing happened: is there something like > incremental column names? You can use insert time, or just use a LexicalUUID. > Or can I user item_id as column name and a null-ish placeolder as value? Or that too. > I share Keith concern: if we use Long as column names, won't we end up > seeing just one user > instead of 'all users that liked N items'? That's true. So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long + something else is common enough that we might want to add it to the defaults...) -Jonathan