cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla <>
Subject Re: Comprehensive documentation on Cassandra Data modelling
Date Tue, 16 Dec 2014 17:36:57 GMT
Data Modeling a distributed application could be a book unto itself.
However, I will add, modeling by restriction is basically the entire
thought process in Cassandra data modeling since it's a distributed hash
table and a core aspect of that sort of application is you need to be able
to quickly locate which server owns the data you want in the cluster (which
is provided by the partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works.
I'm not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise
you'd have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that

   1. Querying a single server will be faster than querying many servers
   2. Multiple tables with the same data but with different partition keys
   is much easier to scale that a single table that you have to scan the whole
   cluster for your answer.

If you accept this, you've basically got the key principle down...most
other ideas are extensions of this, some nuance includes dealing with
tombstones, partition size and order. and I can answer any more specifics.

I've been meaning to write a series of blog posts on this, but as I stated,
it's almost a book unto itself. Data modeling a distributed application
requires a fundamental rethink of all the assumptions we've been taught for
master/slave style databases.

On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania <> wrote:
> Hi,
> I have been having a few exchanges with contributors to the project around
> what is possible with Cassandra and a common response that comes up when I
> describe functionality as broken or missing is that I am not modelling my
> data correctly. Unfortunately, I cannot seem to find comprehensive
> documentation on modelling with Cassandra. In particular, I am finding
> myself modelling by restriction rather than what I would like to do.
> Does such documentations exist? If not, is there any effort to create such
> documentation?The DataStax documentation on data modelling is far too weak
> to be meaningful.
> In particular, I am caught because:
> 1) I want to search on a specific column to make updates to it after
> further processing; ie I don't know its value on first insert
> 2) If I want to search on a column, it has to be part of the primary key
> 3) If a column is part of the primary key, it cannot be edited so I have a
> circular dependency
> Thanks,
> Jason


[image: datastax_logo.png] <>

Ryan Svihla

Solution Architect

[image: twitter.png] <> [image: linkedin.png]

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

View raw message