Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 74.125.82.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CDC8CAED.295BD%Dean.Hiller@nrel.gov>
References: 
 <CAGFecmQhBvhG91YkNMhvz8YMbLXH14Jtf+1c06=nDdVVk8b0Mg@mail.gmail.com>
	<CDC8CAED.295BD%Dean.Hiller@nrel.gov>
Date: Mon, 27 May 2013 13:10:23 -0400
Message-ID: 
 <CAENxBwykTN_etZsOmyEqSgFyE0ja7i7g9NXf9=5U7o902ogWRQ@mail.gmail.com>
Subject: Re: Using CQL to insert a column to a row dynamically
From: Edward Capriolo <edlinuxguru@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a11c37dd8696bfb04ddb639a0

--001a11c37dd8696bfb04ddb639a0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

You can add wide rows to CQL3 tables but you can not add wide rows in the
same way you can for non-cql-3 compact storage tables.

"What I'm not understanding is why there is so much emphasis to predefined
columns in CQL examples, particularly in the CREATE TABLE/COLUMNFAMILY
examples:"

^ I ask myself this all the time. I wrote a piece on it a while back.
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_=
less

IMHO there seems to be a huge over-emphasis in pushing the new X, the new
Y, or the new way of doing things. There are several very nice things in
CQL, especially when it comes to packing up complex composite types, but
there are several things it can't do well.

One thing that greatly erks me, if you followed the older advice on how to
do something with cassandra 1.0.7, there is sometimes new advice on how to
do the same thing in 1.2.5.  From reading the new documentation you might
come to the conclusion that the old way or old tools are wrong or bad, but
there are a large group of people that actually like the old ways better!
Your example is the most simple case "should I be able to add columns?"
Yes! Can you? Yes! but not using the new stuff. (well it can be done but no
exactly the same way).

My best advice is just one of those things where you read all available
material and make a decision on what is best for your case. Use the system
that works best with the least hoop jumping.

In a nutshell its very easy to had true schema-less wide rows without the *
and the 'read the blog' use compact storage :)


On Mon, May 27, 2013 at 10:42 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote=
:

> Wide rows, dynamic columns are still possible in CQL3.  There are some
> links here http://comments.gmane.org/gmane.comp.db.cassandra.user/30321
>
> Also, there are other advantages to noSQL, not just schemaless aspect suc=
h
> as that it can accept tons of writes and you can scale the writes(you can=
't
> do that with an RDBMS).  With an RDBMS you can typically scale the reads
> with backups and stuff but there is limits here too.  There are not limit=
s
> with noSQL=85just double your nodes and get double the read throughput. T=
his
> has nothing to do with how much you can store at all.  You maybe are only
> storing 200G with an amazing write/read throughput including TONS of
> deletes to keep it under 200G.
>
> That comes to the next advantage=85.store huge amounts of data.  If you h=
ave
> 1000 machines and 300G on each machine, you are storing 300T or 1/3
> Petabytes.  Have fun with an RDBMS.
>
> So yes, schemaless is one advantage, throughput is another, total storage
> room is yet another.  HA is probably debatable, but in my opinion HA has
> been another advantage we have seen.  We have had a hardware outage and n=
o
> downtime already with cassandra whereas on a previous project oracle RAC
> did not really hold up to it's promises.  There may be another advantage =
I
> may be missing as well.
>
> Also, PlayOrm for java client currently uses thrift(astyanax specifically=
)
> and so do a ton of projects right now.  I know PlayOrm is about to upgrad=
e
> to CQL3 as well so it can do thrift or CQL3 in the future.
>
> Later,
> Dean
>
> From: Matthew Hillsborough <matthew.hillsborough@gmail.com<mailto:
> matthew.hillsborough@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, May 27, 2013 8:28 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Using CQL to insert a column to a row dynamically
>
> Hi all,
>
> I posted a similar thread on stackoverflow - hope it's not repetitive for
> anyone here. Looking for better insight from the community on whether
> Cassandra is the right tool for me or not.
>
> I am trying to understand some fundamentals in Cassandra, I was under the
> impression that one of the advantages a developer can take in designing a
> data model is by dynamically adding columns to a row identified by a key.
> That means I can model my data so that if it makes sense, a key can be
> something such as a user_id from a relational database, and I can for
> example, create arbitrary amounts of columns that relate to that user.
>
> What I'm not understanding is why there is so much emphasis to predefined
> columns in CQL examples, particularly in the CREATE TABLE/COLUMNFAMILY
> examples:
>
> CREATE TABLE emp (
>
>   empID int,
>
>   deptID int,
>
>   first_name varchar,
>
>   last_name varchar,
>
>   PRIMARY KEY (empID, deptID)
>
> );
>
> Wouldn't this type of model make more sense to just stuff into a
> relational database? What if I don't know my column name until runtime an=
d
> need to dynamically create it? Do I have to use ALTER TABLE to add a new
> column to the row using CQL? The particular app use-case I have in mind I
> would just need a key identifier and arbitrary column names where the
> column name might include a timestamp+variable_identifier. The whole poin=
t
> is that so I can see have extremely wide rows at the wonderful performanc=
e
> that Cassandra has to offer. As of right now, from everything I'm reading
> in regards to DataStax recommending CQL over Thrift (I think what I'm
> describing is possible with Thrift, but correct me if I'm wrong). That
> means I'd have to go AGAINST the recommendation to a protocol that's pret=
ty
> much going to eventually not be supported.
>
> Is Cassandra the right tool for that? Are the predefined columns in
> documentation nothing more than an example? How does one add a dynamic
> column name with an existing column family/table? If I'm stuck with stati=
c
> columns, how is this any different than using a relational database such =
as
> postgres or mysql? What I found really powerful about Cassandra is being
> able to do something like the following in cassandra-cli which uses Thrif=
t:
>
>
> SET mycf[id]['arbitrary_column'] =3D 'foo';
>
> However, doing that in CQL isn't possible. Completely limits the way I wa=
s
> going to model my data for an application and would have no distinct
> advantage over a relational database.
>
>
> Please tell me I'm an idiot and/or am wrong and how I can make this work.
> It seems Thrift is the only solution, but I hate going against the
> recommended protocol.
>
>
> Thanks.
>

--001a11c37dd8696bfb04ddb639a0
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div>You can add wide rows to CQL3 tab=
les but you can not add wide rows in the same way you can for non-cql-3 com=
pact storage tables.<br><br>&quot;What I&#39;m not understanding is why the=
re is so much emphasis to=20
predefined columns in CQL examples, particularly in the CREATE=20
TABLE/COLUMNFAMILY examples:&quot; <br><br></div>^ I ask myself this all th=
e time. I wrote a piece on it a while back. <a href=3D"http://www.edwardcap=
riolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less">http://www.edw=
ardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less</a><br>
<br></div>IMHO there seems to be a huge over-emphasis in pushing the new X,=
 the new Y, or the new way of doing things. There are several very nice thi=
ngs in CQL, especially when it comes to packing up complex composite types,=
 but there are several things it can&#39;t do well. <br>
<br>One thing that greatly erks me, if you followed the older advice on how=
 to do something with cassandra 1.0.7, there is sometimes new advice on how=
 to do the same thing in 1.2.5.=A0 From reading the new documentation you m=
ight come to the conclusion that the old way or old tools are wrong or bad,=
 but there are a large group of people that actually like the old ways bett=
er! Your example is the most simple case &quot;should I be able to add colu=
mns?&quot; Yes! Can you? Yes! but not using the new stuff. (well it can be =
done but no exactly the same way). <br>
</div><br></div>My best advice is just one of those things where you read a=
ll available material and make a decision on what is best for your case. Us=
e the system that works best with the least hoop jumping. <br><br></div>
<div>In a nutshell its very easy to had true schema-less wide rows without =
the * and the &#39;read the blog&#39; use compact storage :)<br></div><div>=
<br></div><br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_=
quote">
On Mon, May 27, 2013 at 10:42 AM, Hiller, Dean <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:Dean.Hiller@nrel.gov" target=3D"_blank">Dean.Hiller@nrel.gov</a=
>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Wide rows, dynamic columns are still possible in CQL3. =A0There are some li=
nks here <a href=3D"http://comments.gmane.org/gmane.comp.db.cassandra.user/=
30321" target=3D"_blank">http://comments.gmane.org/gmane.comp.db.cassandra.=
user/30321</a><br>

<br>
Also, there are other advantages to noSQL, not just schemaless aspect such =
as that it can accept tons of writes and you can scale the writes(you can&#=
39;t do that with an RDBMS). =A0With an RDBMS you can typically scale the r=
eads with backups and stuff but there is limits here too. =A0There are not =
limits with noSQL=85just double your nodes and get double the read throughp=
ut. This has nothing to do with how much you can store at all. =A0You maybe=
 are only storing 200G with an amazing write/read throughput including TONS=
 of deletes to keep it under 200G.<br>

<br>
That comes to the next advantage=85.store huge amounts of data. =A0If you h=
ave 1000 machines and 300G on each machine, you are storing 300T or 1/3 Pet=
abytes. =A0Have fun with an RDBMS.<br>
<br>
So yes, schemaless is one advantage, throughput is another, total storage r=
oom is yet another. =A0HA is probably debatable, but in my opinion HA has b=
een another advantage we have seen. =A0We have had a hardware outage and no=
 downtime already with cassandra whereas on a previous project oracle RAC d=
id not really hold up to it&#39;s promises. =A0There may be another advanta=
ge I may be missing as well.<br>

<br>
Also, PlayOrm for java client currently uses thrift(astyanax specifically) =
and so do a ton of projects right now. =A0I know PlayOrm is about to upgrad=
e to CQL3 as well so it can do thrift or CQL3 in the future.<br>
<br>
Later,<br>
Dean<br>
<br>
From: Matthew Hillsborough &lt;<a href=3D"mailto:matthew.hillsborough@gmail=
.com">matthew.hillsborough@gmail.com</a>&lt;mailto:<a href=3D"mailto:matthe=
w.hillsborough@gmail.com">matthew.hillsborough@gmail.com</a>&gt;&gt;<br>
Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Date: Monday, May 27, 2013 8:28 AM<br>
To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apach=
e.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassa=
ndra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.apache.o=
rg">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandr=
a.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Subject: Using CQL to insert a column to a row dynamically<br>
<br>
Hi all,<br>
<br>
I posted a similar thread on stackoverflow - hope it&#39;s not repetitive f=
or anyone here. Looking for better insight from the community on whether Ca=
ssandra is the right tool for me or not.<br>
<br>
I am trying to understand some fundamentals in Cassandra, I was under the i=
mpression that one of the advantages a developer can take in designing a da=
ta model is by dynamically adding columns to a row identified by a key. Tha=
t means I can model my data so that if it makes sense, a key can be somethi=
ng such as a user_id from a relational database, and I can for example, cre=
ate arbitrary amounts of columns that relate to that user.<br>

<br>
What I&#39;m not understanding is why there is so much emphasis to predefin=
ed columns in CQL examples, particularly in the CREATE TABLE/COLUMNFAMILY e=
xamples:<br>
<br>
CREATE TABLE emp (<br>
<br>
=A0 empID int,<br>
<br>
=A0 deptID int,<br>
<br>
=A0 first_name varchar,<br>
<br>
=A0 last_name varchar,<br>
<br>
=A0 PRIMARY KEY (empID, deptID)<br>
<br>
);<br>
<br>
Wouldn&#39;t this type of model make more sense to just stuff into a relati=
onal database? What if I don&#39;t know my column name until runtime and ne=
ed to dynamically create it? Do I have to use ALTER TABLE to add a new colu=
mn to the row using CQL? The particular app use-case I have in mind I would=
 just need a key identifier and arbitrary column names where the column nam=
e might include a timestamp+variable_identifier. The whole point is that so=
 I can see have extremely wide rows at the wonderful performance that Cassa=
ndra has to offer. As of right now, from everything I&#39;m reading in rega=
rds to DataStax recommending CQL over Thrift (I think what I&#39;m describi=
ng is possible with Thrift, but correct me if I&#39;m wrong). That means I&=
#39;d have to go AGAINST the recommendation to a protocol that&#39;s pretty=
 much going to eventually not be supported.<br>

<br>
Is Cassandra the right tool for that? Are the predefined columns in documen=
tation nothing more than an example? How does one add a dynamic column name=
 with an existing column family/table? If I&#39;m stuck with static columns=
, how is this any different than using a relational database such as postgr=
es or mysql? What I found really powerful about Cassandra is being able to =
do something like the following in cassandra-cli which uses Thrift:<br>

<br>
<br>
SET mycf[id][&#39;arbitrary_column&#39;] =3D &#39;foo&#39;;<br>
<br>
However, doing that in CQL isn&#39;t possible. Completely limits the way I =
was going to model my data for an application and would have no distinct ad=
vantage over a relational database.<br>
<br>
<br>
Please tell me I&#39;m an idiot and/or am wrong and how I can make this wor=
k. It seems Thrift is the only solution, but I hate going against the recom=
mended protocol.<br>
<br>
<br>
Thanks.<br>
</blockquote></div><br></div>

--001a11c37dd8696bfb04ddb639a0--