Hi Michael,

Thanks for the feedback, all makes sense. 

If anyone wants me to raise a jira ticket for docs on (key1, key2) vs ((key1,key2)) and their implications, or fixing that if block in SelectStatement, let me know - though for the if block possibly best if that jira is raised by a C* expert so it uses the right terminology to describe the problem ;)

Kind regards,


On Fri, Mar 14, 2014 at 12:49 AM, Laing, Michael <michael.laing@nytimes.com> wrote:
These are my personal opinions, reflecting both my long experience w database systems, and my newness to Cassandra...


The Cassandra contributors, having made its history, tend to describe it in terms of implementation rather than action. And its implementation has a history, all relatively recent, that many know, but which to newcomers like me is obscure and, frankly, not particularly relevant.

Note: we are all trying to understand Crimea now, and to really understand, you have to ingest several hundred years of history. Luckily, Cassandra has not been around quite so long!

But Cassandra's history creeps into the nomenclature of CQL3. So what might logically be called a 'hash key' is called a 'partition key', what is called a 'clustering key' might be better termed a 'range key' IMHO. 

The 'official' terms in the nomenclature are important to know, they are just not descriptive of the actions one takes as a user of them. However, they have meaning to those who have 'lived' the history of Cassandra, and form an important bridge to the past.

As a new user I found them non-intuitive. Amazon has done a much better job with DynamoDB - muddled, however, by bad syntax choices.

But you adjust and mentally map... I am still bumfuzzled when people talk of slices and other C* cruft but just let it slide by like lectures from my mother. That and thrift can just fade into history with gopher and lynx as far as I am concerned - CQL3 is where it's at.

But another thing to remember is that performance is king - and to get performance you fly 'close to the metal': Cassandra does that and you should know the code paths, the physical structures, and the characteristics of your 'metal' to understand how to build high-performing apps.


The answer to both asterisks is Yes. You should use the term 'clustering column' because that is what is in the docs - but you should think 'range key' for how you use it. Similarly 'partition key' : 'hash key'.

Good luck,