Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62982944E for ; Wed, 30 Nov 2011 21:07:48 +0000 (UTC) Received: (qmail 61193 invoked by uid 500); 30 Nov 2011 21:07:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61161 invoked by uid 500); 30 Nov 2011 21:07:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61153 invoked by uid 99); 30 Nov 2011 21:07:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2011 21:07:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2011 21:07:39 +0000 Received: by faap14 with SMTP id p14so1125155faa.31 for ; Wed, 30 Nov 2011 13:07:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.102.233 with SMTP id fr9mr2828157wib.40.1322687238612; Wed, 30 Nov 2011 13:07:18 -0800 (PST) Received: by 10.180.8.232 with HTTP; Wed, 30 Nov 2011 13:07:18 -0800 (PST) In-Reply-To: <4ED698C0.60108@syncopated.net> References: <4ED698C0.60108@syncopated.net> Date: Wed, 30 Nov 2011 15:07:18 -0600 Message-ID: Subject: Re: data modeling question From: David McNelis To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d04451a2701a76e04b2fa1fdb X-Virus-Checked: Checked by ClamAV on apache.org --f46d04451a2701a76e04b2fa1fdb Content-Type: text/plain; charset=ISO-8859-1 Personally I would create a separate column family for each basic area. For example To organize my sectors and symbols I would create a column family where the key is the sector name and the column names are the symbols for that sector, i.e.: sector : { key: sector name Column names: symbols Column values: null } Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote: quote : { key: symbol column names: timeuuid column values: quote at that time for that symbol } I would then use the same basic structure for your other column families, ticks and fundamentals. In general people tend to stay away from super column families when possible for several reasons, but the most commonly sited one is that when you get a SCF, the entire SCF must be deserialized in order to access it. So if you have a bunch of SCF, you're running a risk of ending up needing to read in a lot more data than is necessary to get the information you are looking for. On Wed, Nov 30, 2011 at 2:57 PM, Deno Vichas wrote: > hey all! > > i'm started my first project using cassandra and some data model > questions. i'm working on an app that fetches stock market data. i need > to keep track of when i fetch a set of data for any given stock in any > sector; here's what i think my model should look like; > > fetches : { > : { > quote : { > : { > : --- > } > } > ticks : { > : { > : --- > } > } > fundamentals : { > : { > : --- > } > } > } > } > > > is there anything that less an ideal doing it this way versus creating > separate CF per sector? how do you create Super CF inside of Super CF > via the CLI? > > > > thanks, > deno > > > -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.* --f46d04451a2701a76e04b2fa1fdb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Personally I would create a separate column family for each basic area. =A0= For example

To organize my sectors and symbols I would c= reate a column family where the key is the sector name and the column names= are the symbols for that sector, i.e.:

sector : {

=A0 =A0 key: sector name

=A0 =A0 Column names: symbols

=A0 =A0 Column values: null

<= div>}

Then I would have a column family for quotes= where I have the key as the symbol, the column name as the timestamp, the = value as the quote:

quote : {

=A0 =A0 key: symbol

=A0 = =A0 column names: =A0timeuuid

=A0 =A0 column values: =A0quote at = that time for that symbol

}

I would then= use the same basic structure for your other column families, ticks and fun= damentals. =A0In general people tend to stay away from super column familie= s when possible for several reasons, but the most commonly sited one is tha= t when you get a SCF, the entire SCF must be deserialized in order to acces= s it. =A0So if you have a bunch of SCF, you're running a risk of ending= up needing to read in a lot more data than is necessary to get the informa= tion you are looking for.

On Wed, Nov 30, 2011 at 2:57 PM, Deno Vichas= <deno@syncopat= ed.net> wrote:

=20 =20 =20
hey all!

i'm started my first project using cassandra and some data model questions.=A0 i'm working on an app that fetches stock market dat= a.=A0 i need to keep track of when i fetch a set of data for any given stock in any sector;=A0 here's what i think my model should look like;

fetches : {
=A0=A0=A0 <sector> : {
=A0=A0=A0=A0=A0=A0=A0=A0 quote : {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <timeuuid>: {=A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <symbol> : --- =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 ticks : {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <timeuuid>: {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <symbol> : --- =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 fundamentals : {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <timeuuid>: {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <symbol> : --- =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0 }
}

is there anything that less an ideal doing it this way versus creating separate CF per sector?=A0=A0=A0 how do you create Super CF inside of Super CF via the CLI?

thanks,
deno

--
David McN= elis

Lead Software Engineer

Agentis Energy

=

www.agentisenergy.com

c: 219.384.5143

A Smart Grid technology company focused on helping consumers of energ= y control an often under-managed resource.

--f46d04451a2701a76e04b2fa1fdb--