The problem about "being careful about how much you store in a collection" is that Cassandra is a blind-write system. Knowing how much data is currently in the collection before you write is an anti-pattern, read before write.

Cassandra Rule 1: DON'T READ BEFORE WRITE
Cassandra Rule 2: ROWS CAN HAVE 2 BILLION COLUMNS
Collection Rule 1: DON'T STORE MORE THEN 100 THINGS IN A COLLECTION

Why does are user confused? Its simple.






 


On Thu, Jun 6, 2013 at 10:51 AM, Eric Stevens <mightye@gmail.com> wrote:
CQL3 does now support dynamic columns. For tags or metadata values you could use a Collection:

This should probably be clarified.  A collection is a super useful tool, but it is not the same thing as a dynamic column.  It has many advantages, but there is one huge disadvantage in that you have to be careful how much data you store in a collection. When you read a single value out of a collection, the entire collection is always read, which of course is true for appending data to the collection as well. 

With a traditional dynamic column, you could have added things like event logs to a record in the form of keys named "event:someEvent:TS" (or juxtapose the order as your needs dictate).  You could basically do this practically indefinitely with little degradation in performance.  This was also a common way of representing cross-family relationships (one-to-many style).

If you try to do the same thing with a collection, performance will degrade as your data grows.  For small or relatively static data sets (eg tags) that's fine.  For open-ended data sets (logs, events, one-to-many relationships that grow regularly), you should instead normalize such data into a separate column family.

-Eric Stevens
ProtectWise, Inc.


On Thu, Jun 6, 2013 at 9:49 AM, Francisco Andrades Grassi <bigjocker@gmail.com> wrote:
Hi,

CQL3 does now support dynamic columns. For tags or metadata values you could use a Collection:


For wide rows there's the enhanced primary keys, which I personally prefer over the composite columns of yore:


--
Francisco Andrades Grassi
@bigjocker

On Jun 6, 2013, at 8:32 AM, Joe Greenawalt <joe.greenawalt@gmail.com> wrote:

Hi,
I'm having some problems figuring out how to append a dynamic column on a column family using the datastax java driver 1.0 and CQL3 on Cassandra 1.2.5.  Below is what i'm trying:

cqlsh:simplex> create table user (firstname text primary key, lastname text);
cqlsh:simplex> insert into user (firstname, lastname) values ('joe','shmoe');
cqlsh:simplex> select * from user;

 firstname | lastname
-----------+----------
       joe |    shmoe

cqlsh:simplex> insert into user (firstname, lastname, middlename) values ('joe','shmoe','lester');
Bad Request: Unknown identifier middlename
cqlsh:simplex> insert into user (firstname, lastname, middlename) values ('john','shmoe','lester');
Bad Request: Unknown identifier middlename


I'm assuming you can do this based on previous based thrift based clients like pycassa, and also by reading this:

The Cassandra data model is a dynamic schema, column-oriented data model. This means that, unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application.

here: http://www.datastax.com/docs/1.2/ddl/index

Is it a limitation of CQL3 and its connection vs. thrift?
Or more likely i'm just doing something wrong?

Thanks,
Joe