Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of mvallebr@gmail.com designates
 209.85.214.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CC89B8E9.11F8D%Dean.Hiller@nrel.gov>
References: 
 <CABKQidt+BEK_T-CYD7bUXF3GTgoLtQDZ1sgTRnKJUppK3CKEBA@mail.gmail.com>
	<CC89B8E9.11F8D%Dean.Hiller@nrel.gov>
Date: Thu, 27 Sep 2012 11:45:39 -0300
Message-ID: 
 <CABKQiduXkk4Evm1wFZqJRvc5YV+6KO6Spy9Fp5MGVecXv2F2jg@mail.gmail.com>
Subject: Re: 1000's of column families
From: Marcelo Elias Del Valle <mvallebr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d040838d32a546804caaffe26

--f46d040838d32a546804caaffe26
Content-Type: text/plain; charset=ISO-8859-1

Dean,

     I was used, in the relational world, to use hibernate and O/R mapping.
There were times when I used 3 classes (2 inheriting from 1 another) and
mapped all of the to 1 table. The common part was in the super class and
each sub class had it's own columns. The table, however, use to have all
the columns and this design was hard because of that, as creating more
subclasses would need changes in the table.
     However, if you use playOrm and if playOrm has/had a feature to allow
inheritance mapping to a CF, it would solve your problem, wouldn't it? Of
course it is probably much harder than it might problably appear... :D

Best regards,
Marcelo Valle.

2012/9/27 Hiller, Dean <Dean.Hiller@nrel.gov>

> We have 1000's of different building devices and we stream data from these
> devices.  The format and data from each one varies so one device has
> temperature at timeX with some other variables, another device has CO2
> percentage and other variables.  Every device is unique and streams it's
> own data.  We dynamically discover devices and register them.  Basically,
> one CF or table per thing really makes sense in this environment.  While we
> could try to find out which devices "are" similar, this would really be a
> pain and some devices add some new variable into the equation.  NOT only
> that but researchers can register new datasets and upload them as well and
> each dataset they have they do NOT want to share with other researches
> necessarily so we have security groups and each CF belongs to security
> groups.  We dynamically create CF's on the fly as people register new
> datasets.
>
> On top of that, when the data sets get too large, we probably want to
> partition a single CF into time partitions.  We could create one CF and put
> all the data and have a partition per device, but then a time partition
> will contain "multiple" devices of data meaning we need to shrink our time
> partition size where if we have CF per device, the time partition can be
> larger as it is only for that one device.
>
> THEN, on top of that, we have a meta CF for these devices so some people
> want to query for streams that match criteria AND which returns a CF name
> and they query that CF name so we almost need a query with variables like
> select cfName from Meta where x = y and then select * from cfName where
> xxxxx. Which we can do today.
>
> Dean
>
> From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:
> mvallebr@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Thursday, September 27, 2012 8:01 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: 1000's of column families
>
> Out of curiosity, is it really necessary to have that amount of CFs?
> I am probably still used to relational databases, where you would use a
> new table just in case you need to store different kinds of data. As
> Cassandra stores anything in each CF, it might probably make sense to have
> a lot of CFs to store your data...
> But why wouldn't you use a single CF with partitions in these case?
> Wouldn't it be the same thing? I am asking because I might learn a new
> modeling technique with the answer.
>
> []s
>
> 2012/9/26 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
> We are streaming data with 1 stream per 1 CF and we have 1000's of CF.
>  When using the tools they are all geared to analyzing ONE column family at
> a time :(.  If I remember correctly, Cassandra supports as many CF's as you
> want, correct?  Even though I am going to have tons of funs with
> limitations on the tools, correct?
>
> (I may end up wrapping the node tool with my own aggregate calls if needed
> to sum up multiple column families and such).
>
> Thanks,
> Dean
>
>
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>


-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--f46d040838d32a546804caaffe26
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Dean,=A0<div><br></div><div>=A0 =A0 =A0I was used, in the relational world,=
 to use hibernate and O/R mapping. There were times when I used 3 classes (=
2 inheriting from 1 another) and mapped all of the to 1 table. The common p=
art was in the super class and each sub class had it&#39;s own columns. The=
 table, however, use to have all the columns and this design was hard becau=
se of that, as creating more subclasses would need changes in the table.</d=
iv>
<div>=A0 =A0 =A0However, if you use playOrm and if playOrm has/had a featur=
e to allow inheritance mapping to a CF, it would solve your problem, wouldn=
&#39;t it? Of course it is probably much harder than it might problably app=
ear... :D</div>
<div><br></div><div>Best regards,</div><div>Marcelo Valle.<br><br><div clas=
s=3D"gmail_quote">2012/9/27 Hiller, Dean <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:Dean.Hiller@nrel.gov" target=3D"_blank">Dean.Hiller@nrel.gov</a>&gt;<=
/span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">We have 1000&#39;s of different building dev=
ices and we stream data from these devices. =A0The format and data from eac=
h one varies so one device has temperature at timeX with some other variabl=
es, another device has CO2 percentage and other variables. =A0Every device =
is unique and streams it&#39;s own data. =A0We dynamically discover devices=
 and register them. =A0Basically, one CF or table per thing really makes se=
nse in this environment. =A0While we could try to find out which devices &q=
uot;are&quot; similar, this would really be a pain and some devices add som=
e new variable into the equation. =A0NOT only that but researchers can regi=
ster new datasets and upload them as well and each dataset they have they d=
o NOT want to share with other researches necessarily so we have security g=
roups and each CF belongs to security groups. =A0We dynamically create CF&#=
39;s on the fly as people register new datasets.<br>

<br>
On top of that, when the data sets get too large, we probably want to parti=
tion a single CF into time partitions. =A0We could create one CF and put al=
l the data and have a partition per device, but then a time partition will =
contain &quot;multiple&quot; devices of data meaning we need to shrink our =
time partition size where if we have CF per device, the time partition can =
be larger as it is only for that one device.<br>

<br>
THEN, on top of that, we have a meta CF for these devices so some people wa=
nt to query for streams that match criteria AND which returns a CF name and=
 they query that CF name so we almost need a query with variables like sele=
ct cfName from Meta where x =3D y and then select * from cfName where xxxxx=
. Which we can do today.<br>

<br>
Dean<br>
<br>
From: Marcelo Elias Del Valle &lt;<a href=3D"mailto:mvallebr@gmail.com">mva=
llebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com">mvalleb=
r@gmail.com</a>&gt;&gt;<br>
Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Date: Thursday, September 27, 2012 8:01 AM<br>
To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apach=
e.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassa=
ndra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.apache.o=
rg">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandr=
a.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Subject: Re: 1000&#39;s of column families<br>
<div class=3D"im"><br>
Out of curiosity, is it really necessary to have that amount of CFs?<br>
I am probably still used to relational databases, where you would use a new=
 table just in case you need to store different kinds of data. As Cassandra=
 stores anything in each CF, it might probably make sense to have a lot of =
CFs to store your data...<br>

But why wouldn&#39;t you use a single CF with partitions in these case? Wou=
ldn&#39;t it be the same thing? I am asking because I might learn a new mod=
eling technique with the answer.<br>
<br>
[]s<br>
<br>
</div>2012/9/26 Hiller, Dean &lt;<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&gt;&gt;<br>
<div class=3D"HOEnZb"><div class=3D"h5">We are streaming data with 1 stream=
 per 1 CF and we have 1000&#39;s of CF. =A0When using the tools they are al=
l geared to analyzing ONE column family at a time :(. =A0If I remember corr=
ectly, Cassandra supports as many CF&#39;s as you want, correct? =A0Even th=
ough I am going to have tons of funs with limitations on the tools, correct=
?<br>

<br>
(I may end up wrapping the node tool with my own aggregate calls if needed =
to sum up multiple column families and such).<br>
<br>
Thanks,<br>
Dean<br>
<br>
<br>
<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Marcelo Elias Del Valle<br><a href=3D"http://mvalle.com" target=3D"_blank">=
http://mvalle.com</a>=A0- @mvallebr<br>
</div>

--f46d040838d32a546804caaffe26--