Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 58363 invoked from network); 26 Nov 2009 18:12:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Nov 2009 18:12:53 -0000 Received: (qmail 37816 invoked by uid 500); 26 Nov 2009 18:12:53 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 37779 invoked by uid 500); 26 Nov 2009 18:12:52 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 37770 invoked by uid 99); 26 Nov 2009 18:12:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Nov 2009 18:12:52 +0000 X-ASF-Spam-Status: No, hits=-6.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_MED X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [131.215.239.119] (HELO mail.alumni.caltech.edu) (131.215.239.119) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Nov 2009 18:12:47 +0000 Received: from localhost (dsl081-082-089.lax1.dsl.speakeasy.net [64.81.82.89]) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 902223F0E60; Thu, 26 Nov 2009 10:12:16 -0800 (PST) X-DKIM: Sendmail DKIM Filter v2.8.2 mail.alumni.caltech.edu 902223F0E60 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alumni.caltech.edu; s=enforce; t=1259259137; bh=UgZY7H7lCVynvent5GHpCL++bqNgcgKmahAdfa8vDOQ=; h=Date:From:To:Subject:Message-ID:References:Mime-Version: Content-Type:In-Reply-To; b=ptVanl8y0byeR2w/zOTNWBaYy09OjCmRWtqRYGTCSwBSepAcEWj/PkUxleKvcMYmf jVia8vUz1zrzpwDsXkHxTMvz2PvBhPznqugsTqL0lkTpm6S/NtJdoMMMVfC+vBOhSm pu7vAnpcrNG/w1pvSKYnRpEeq43YPXI2cGPnoZjA= Date: Thu, 26 Nov 2009 10:12:15 -0800 From: Anthony Molinaro To: cassandra-user@incubator.apache.org Subject: Re: Modeling question Message-ID: <20091126181215.GA65552@alumni.caltech.edu> Mail-Followup-To: cassandra-user@incubator.apache.org References: <828083e70911260052r62544f64rabac24914ae857f0@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <828083e70911260052r62544f64rabac24914ae857f0@mail.gmail.com> User-Agent: Mutt/1.4.2.3i X-MailScanner-Information-Alumni: Please contact the Alumni Office for more information X-Alumni-MailScanner-ID: 902223F0E60.AAE44 X-MailScanner-Alumni: No Virii found X-MailScanner-From: anthonym@alumni.caltech.edu On Thu, Nov 26, 2009 at 09:52:30AM +0100, gabriele renzi wrote: > in my team, we are considering using cassandra for our project in > place of a (pseudo)relational solution, but I am not sure on how we > should handle a couple of modeling issues. > Basically, my problem is how to bring into cassandra a db where > elements are in the form with > the pair is unique (think weight>) and we mostly do queries in the form > select secondary, data from db where primary= x ---- perfect fit > for cassandra > > and in batch jobs we want to rewrite the whole thing _but_ using the > other key for lookup, akin to > 1. insert or update (primary, secondary, data) values (.. .. .. ) -- > we can do this using the primary lookup > 2. delete from db where secondary = x and not in just inserted -- how > do we do this? > > it is my understanding that cassandra does not support secondary > indexes so we would have to do a full scan to perform the #2 > operation, or we should mantain the second index by ourselves indexed > on secondary and containing references to the primary. Unless you are using order preserving partitioning which might or might not be what you want, you won't be able to do a full scan. Instead you should probably have two column families, one keyed by primary, one by secondary, each with a column for the other, then you can do you operations. It uses more space, but disk is cheap so probably not a big deal. If you have to model a many-to-many relationship you can use super columns. So I would imagine 2 super column families like Primary Super Column { '' => { '' => { 'data' => "" }, '' => { 'data' => "" } } Secondary Super Column { '' => { '' => { 'data' => "" }, '' => { '' => { 'data' => "" } } You do your inserts into both, and for deletes you do a get_slice for the secondary id, which will give you all primary ids which contain the secondary id. Then you can delete everything. HTH, -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro