Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70FE8DFD6 for ; Thu, 27 Sep 2012 17:11:43 +0000 (UTC) Received: (qmail 53200 invoked by uid 500); 27 Sep 2012 17:11:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53114 invoked by uid 500); 27 Sep 2012 17:11:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53104 invoked by uid 99); 27 Sep 2012 17:11:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 17:11:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of synfinatic@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 17:11:35 +0000 Received: by obqv19 with SMTP id v19so2446838obq.31 for ; Thu, 27 Sep 2012 10:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=2XVeRr3XXYTAB5aD2CKW3tzx7/GEpwa5cA3VexpzhDA=; b=Dg+ZWMNIx+une3ZNIbj+UGpvHNPyHGucIAA9tU+9sttOnKbAjG6Z5FTJAi7lLM9CXB Cowrzu5AcIbjUj3eajYrwGOPKyJ7I5vyo1vg2etI+7hXDMUQznKYYUXWV9nhnwbe+Mhn sWNMDre9+/uqGXTlSBTfEcl0IXy9gY1tmypLZAEuHfp2LTV6IDMXjQ+1uWd1/nK8YTE5 UoO7mCkmEL1TdLdojPI4Q0qMUMo9kAA4V9kJ+OJOozkTR3jHFkhQx57u1D4v7gaCUVva Bl5FoI5sz4s6XrVOoWi1F71jEwg8zvPoipJI3ljiJjUtsQCgWaGFB7Zwt6y/wb09JCwf 4toQ== Received: by 10.182.50.103 with SMTP id b7mr3747048obo.15.1348765874122; Thu, 27 Sep 2012 10:11:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.42.166 with HTTP; Thu, 27 Sep 2012 10:10:53 -0700 (PDT) In-Reply-To: References: From: Aaron Turner Date: Thu, 27 Sep 2012 18:10:53 +0100 Message-ID: Subject: Re: 1000's of column families To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean wrote: > We have 1000's of different building devices and we stream data from thes= e devices. The format and data from each one varies so one device has temp= erature at timeX with some other variables, another device has CO2 percenta= ge and other variables. Every device is unique and streams it's own data. = We dynamically discover devices and register them. Basically, one CF or t= able per thing really makes sense in this environment. While we could try = to find out which devices "are" similar, this would really be a pain and so= me devices add some new variable into the equation. NOT only that but rese= archers can register new datasets and upload them as well and each dataset = they have they do NOT want to share with other researches necessarily so we= have security groups and each CF belongs to security groups. We dynamical= ly create CF's on the fly as people register new datasets. > > On top of that, when the data sets get too large, we probably want to par= tition a single CF into time partitions. We could create one CF and put al= l the data and have a partition per device, but then a time partition will = contain "multiple" devices of data meaning we need to shrink our time parti= tion size where if we have CF per device, the time partition can be larger = as it is only for that one device. > > THEN, on top of that, we have a meta CF for these devices so some people = want to query for streams that match criteria AND which returns a CF name a= nd they query that CF name so we almost need a query with variables like se= lect cfName from Meta where x =3D y and then select * from cfName where xxx= xx. Which we can do today. How strict are your security requirements? If it wasn't for that, you'd be much better off storing data on a per-statistic basis then per-device. Hell, you could store everything in a single CF by using a composite row key: || But yeah, there isn't a hard limit for the number of CF's, but there is overhead associated with each one and so I wouldn't consider your design as scalable. Generally speaking, hundreds are ok, but thousands is pushing it. --=20 Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Win= dows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"