cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: 1000's of column families
Date Thu, 27 Sep 2012 17:59:21 GMT
Hector also offers support for 'Virtual Keyspaces' which you might
want to look at.

On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner <> wrote:
> On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean <> wrote:
>> We have 1000's of different building devices and we stream data from these devices.
 The format and data from each one varies so one device has temperature at timeX with some
other variables, another device has CO2 percentage and other variables.  Every device is unique
and streams it's own data.  We dynamically discover devices and register them.  Basically,
one CF or table per thing really makes sense in this environment.  While we could try to find
out which devices "are" similar, this would really be a pain and some devices add some new
variable into the equation.  NOT only that but researchers can register new datasets and upload
them as well and each dataset they have they do NOT want to share with other researches necessarily
so we have security groups and each CF belongs to security groups.  We dynamically create
CF's on the fly as people register new datasets.
>> On top of that, when the data sets get too large, we probably want to partition a
single CF into time partitions.  We could create one CF and put all the data and have a partition
per device, but then a time partition will contain "multiple" devices of data meaning we need
to shrink our time partition size where if we have CF per device, the time partition can be
larger as it is only for that one device.
>> THEN, on top of that, we have a meta CF for these devices so some people want to
query for streams that match criteria AND which returns a CF name and they query that CF name
so we almost need a query with variables like select cfName from Meta where x = y and then
select * from cfName where xxxxx. Which we can do today.
> How strict are your security requirements?  If it wasn't for that,
> you'd be much better off storing data on a per-statistic basis then
> per-device.  Hell, you could store everything in a single CF by using
> a composite row key:
> <devicename>|<stat type>|<instance>
> But yeah, there isn't a hard limit for the number of CF's, but there
> is overhead associated with each one and so I wouldn't consider your
> design as scalable.  Generally speaking, hundreds are ok, but
> thousands is pushing it.
> --
> Aaron Turner
>         Twitter: @synfinatic
> - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"

View raw message