cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: CQL support for compound columns
Date Mon, 02 Jan 2012 17:52:04 GMT
Maybe the ship on this has sailed, but I am a bit miffed  on "create
table". CQL is going out of its way to make things so easy for people. But
if someone does not understand the concept of a column family making it
easy for them to design something that is an anti-pattern is odd to me.

As an admin I have been called many times to troubleshoot database
performance issues databases. It sometimes boils down to a bad schema
choice. At later/production stages these become hard to dig out of. It
usually takes more hardware, converting GB or TB of data, application cut
overs.

I do not call "column families" "tables". If someone newer to cassandra did
I would correct them. Why not call Java references pointers? I hate being
ambiguous on key terminology.



On Mon, Jan 2, 2012 at 11:53 AM, Eric Evans <eevans@acunu.com> wrote:

> On Sat, Dec 31, 2011 at 1:12 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> > On Fri, Dec 30, 2011 at 12:30 PM, Eric Evans <eevans@acunu.com> wrote:
> >>> CREATE TABLE timeline (
> >>>    user_id int,
> >>>    posted_at uuid,
> >>>    body string,
> >>>    posted_by string,
> >>>    PRIMARY KEY(user_id, posted_at, posted_by),
> >>>    VALUE(body)
> >>> );
> >>
> >> I think the value declaration also helps in that it's one more thing
> >> that provides cues as to the data model it creates (more expressive).
> >> But this got me thinking, why not introduce something special for the
> >> composite name as well?  That way the PRIMARY KEY syntax (which comes
> >> preloaded with meaning and expectations) could be kept more SQLish,
> >> and the whole thing looks more like an extension to the language as
> >> opposed to a modification.
> >>
> >> Say:
> >>
> >> CREATE TABLE timeline (
> >>  user_id int PRIMARY KEY,
> >>  posted_at uuid,
> >>  body text,
> >>  posted_by text,
> >>  COMPOSITE_NAME(posted_at, posted_by),
> >>  COMPOSITE_VALUE(body)
> >> )
> >
> > I went back and forth on this mentally, but I come down as -0 on CN
> > instead of PK.  For two reasons:
> >
> > First, the composite PRIMARY KEY is a better description of what you
> > can actually do with the data.  In a relational model, a PK of user_id
> > means there is only one (user_id, posted_at, body, posted_by) row with
> > a given user_id.  Which is not the case here.  PK = (row key +
> > composite components) captures exactly what is "immutable and unique"
> > in a given object, so it's actually exactly what it's meant for and
> > not an abuse at all.  (It even fits nicely with the "queries involving
> > the PK are always indexed" assumption that isn't required by the SQL
> > standard but every other database does anyway because it makes the
> > most sense.)
>
> Yeah, you're right, PK is a better fit for this.
>
> Now that I'm forced to think about it a bit more, I think my un-SQL
> reaction is probably rooted more in the abuse of the PRIMARY KEY
> syntax, than the meaning it conveys.
>
> In SQL, PRIMARY KEY is a modifier to a column spec, and here PRIMARY
> KEY(user_id, posted_at, posted_by) reads like a PRIMARY modifier
> applied to a KEY() function.  It's also a little strange the way it
> appears in the grouping of column specs, when it's actually defining a
> grouping or relationship of them (maybe this is what you meant about
> using TRANSPOSED WITH <options> to emphasize the non-standard).
>
> I wonder if there isn't a way to keep the PRIMARY KEY connection while
> making it a little more SQL (and hence more intuitive).  Maybe
> something like:
>
>
> CREATE TABLE timeline (
>  (user_id int, posted_at uuid, posted_by) PRIMARY KEY,
>  body text
> )
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message