Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 61043 invoked from network); 2 Jun 2010 02:33:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Jun 2010 02:33:58 -0000 Received: (qmail 33879 invoked by uid 500); 2 Jun 2010 02:33:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33811 invoked by uid 500); 2 Jun 2010 02:33:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33802 invoked by uid 99); 2 Jun 2010 02:33:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 02:33:57 +0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.172] (HELO mail-px0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 02:33:49 +0000 Received: by pxi19 with SMTP id 19so2556772pxi.31 for ; Tue, 01 Jun 2010 19:33:27 -0700 (PDT) Received: by 10.141.100.17 with SMTP id c17mr5690344rvm.0.1275446007331; Tue, 01 Jun 2010 19:33:27 -0700 (PDT) Received: from [192.168.1.106] ([67.188.70.149]) by mx.google.com with ESMTPS id b1sm5839067rvn.14.2010.06.01.19.33.24 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 01 Jun 2010 19:33:26 -0700 (PDT) From: Peter Hsu Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Column or SuperColumn Date: Tue, 1 Jun 2010 19:33:22 -0700 Message-Id: <63844776-9412-43C4-9B3C-10EEA35C5F9B@motivecast.com> To: user@cassandra.apache.org Mime-Version: 1.0 (Apple Message framework v1078) X-Mailer: Apple Mail (2.1078) I have a pretty simple data modeling question. I don't know whether or = not to use a CF or SCF in one instance. Here's my example. I have an Store entry and locations for each store. = So I have something like: Using CF: Store { //CF storeId { //row key storeName:str, storeLogo:image } storeId:locationId1 { locationName:str, latLong:coordinate } storeId:locationId2 { locationName:str, latLong:coordinate } } Using SCF: Store { //SCF storeId { //row key store { storeName:str, storeLogo:image } locationId1 { locationName:str, latLong:coordinate } locationId2 { locationName:str, latLong:coordinate } } } Queries: Reads: 1. Read store and all locations (could be done by range query = efficiently when using CF, since I'm using OPP) 2. Read only a particular location of a store (don't need the store = meta data here) 3. Read only store name info (don't need any location info here) Writes: 1. Update store meta data (without touching location info) 2. Update location data for a store (without touching rest of store = data) 3. Add a new location to an existing store (would have a unique = identifier for location, no worries about having to do a read..) I read that SuperColumns are not as fast as Columns, and obviously you = can't have indexed subcolumns of supercolumns, but in this case I don't = need the subsubcolumn indices. It seems cleaner to model it as a = SuperColumn, but why would I want to pay a performance penalty instead = of just concating my keys. This seems like a fairly common pattern? What's the rule to decide = between CF and SCF? Thanks, Peter=