Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
From: Peter Hsu <peter@motivecast.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Column or SuperColumn
Date: Tue, 1 Jun 2010 19:33:22 -0700
Message-Id: <63844776-9412-43C4-9B3C-10EEA35C5F9B@motivecast.com>
To: user@cassandra.apache.org
Mime-Version: 1.0 (Apple Message framework v1078)

I have a pretty simple data modeling question.  I don't know whether or =
not to use a CF or SCF in one instance.

Here's my example.  I have an Store entry and locations for each store.  =
So I have something like:

Using CF:
Store { //CF
   storeId { //row key
      storeName:str,
      storeLogo:image
   }
   storeId:locationId1 {
      locationName:str,
      latLong:coordinate
   }
   storeId:locationId2 {
      locationName:str,
      latLong:coordinate
   }
}

Using SCF:
Store { //SCF
   storeId { //row key
      store {
          storeName:str,
          storeLogo:image
      }
      locationId1 {
          locationName:str,
          latLong:coordinate
      }
      locationId2 {
          locationName:str,
          latLong:coordinate
      }
   }
}

Queries:

Reads:
 1. Read store and all locations (could be done by range query =
efficiently when using CF, since I'm using OPP)
 2. Read only a particular location of a store (don't need the store =
meta data here)
 3. Read only store name info (don't need any location info here)

Writes:
 1. Update store meta data (without touching location info)
 2. Update location data for a store (without touching rest of store =
data)
 3. Add a new location to an existing store (would have a unique =
identifier for location, no worries about having to do a read..)

I read that SuperColumns are not as fast as Columns, and obviously you =
can't have indexed subcolumns of supercolumns, but in this case I don't =
need the subsubcolumn indices.  It seems cleaner to model it as a =
SuperColumn, but why would I want to pay a performance penalty instead =
of just concating my keys.

This seems like a fairly common pattern?  What's the rule to decide =
between CF and SCF?

Thanks,
Peter=