cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Kania <jason.ka...@ymail.com>
Subject Re: Comprehensive documentation on Cassandra Data modelling
Date Tue, 16 Dec 2014 18:01:00 GMT
Ryan,
Thanks for the response. It offers a bit more clarity.
I think a series of blog posts with good real world examples would go a long way to increasing
usability of Cassandra. Right now I find the process like going through a mine field because
I only discover what is not possible after trying something that I would find logical and
failing.

For my specific questions, the problem is that since searching is only possible on columns
in the primary key and the primary key cannot be updated, I am not sure what the appropriate
solution is when data exists that needs to be searched and then updated. What is the preferrable
approach to this? Is the expectation to maintain a series of tables, one for each stage of
data manipulation with its own primary key?
Thanks,
Jason
      From: Ryan Svihla <rsvihla@datastax.com>
 To: user@cassandra.apache.org 
 Sent: Tuesday, December 16, 2014 12:36 PM
 Subject: Re: Comprehensive documentation on Cassandra Data modelling
   
Data Modeling a distributed application could be a book unto itself. However, I will add,
modeling by restriction is basically the entire thought process in Cassandra data modeling
since it's a distributed hash table and a core aspect of that sort of application is you need
to be able to quickly locate which server owns the data you want in the cluster (which is
provided by the partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works. I'm not sure what
the problem is
2) Yes, the partition key tells you which server owns the data, otherwise you'd have to scan
all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that :

   
   - Querying a single server will be faster than querying many servers
   - Multiple tables with the same data but with different partition keys is much easier to
scale that a single table that you have to scan the whole cluster for your answer. 

If you accept this, you've basically got the key principle down...most other ideas are extensions
of this, some nuance includes dealing with tombstones, partition size and order. and I can
answer any more specifics. 

I've been meaning to write a series of blog posts on this, but as I stated, it's almost a
book unto itself. Data modeling a distributed application requires a fundamental rethink of
all the assumptions we've been taught for master/slave style databases.




On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania <jason.kania@ymail.com> wrote:
Hi,
I have been having a few exchanges with contributors to the project around what is possible
with Cassandra and a common response that comes up when I describe functionality as broken
or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find
comprehensive documentation on modelling with Cassandra. In particular, I am finding myself
modelling by restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such documentation?The
DataStax documentation on data modelling is far too weak to be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further processing; ie
I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a column is
part of the primary key, it cannot be edited so I have a circular dependency
Thanks,
Jason



-- 
Ryan SvihlaSolution Architect
 

DataStax is the fastest, most scalable distributed database technology, delivering Apache
Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on,
and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax
is the database technology and transactional backbone of choice for the worlds most innovative
companies such as Netflix, Adobe, Intuit, and eBay. 


  
Mime
View raw message