cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Cassandra gotchas ...
Date Sun, 09 Jan 2011 02:22:15 GMT
> I know that Cassandra is a work in progress and there are many
> limitations I can live with, but it would be nice to know what the
> roadmap is for the next 12-24 months so we can get an idea of what major
> directions Cassandra is going in so we can plan accordingly. 

Take a look at Jira - https://issues.apache.org/jira/browse/CASSANDRA - there are many, many
tickets slated for 0.7.1 and 0.8.  Also, you can get involved by taking part in discussions
on the dev list.
If you feel that a feature is lacking, you can also create tickets in Jira.

> It would be nice if the community could vote of features considered so that the
> devs would have an idea of where the major pain points are for the users
> of Cassandra.


Jira tickets can be voted on.  For example, Cassandra-1072 - distributed counters - was recently
committed to trunk (https://issues.apache.org/jira/browse/CASSANDRA-1072).  You can tell by
the votes
and watches, as well as discussion that it was a popular ticket.  There were many,
many, many discussions about the feature including alternate implementations.  Several companies
were involved.  Discussion took place mostly on the ticket, IRC and the dev mailing list.

Speaking of, you can check out the IRC channels to discuss tickets, features and plans as
well.
See http://wiki.apache.org/cassandra/IRC

I can't speak on all of the things you brought up but Jira, the dev mailing list, and IRC
are the primary ways to
propose features, see what's coming, discuss pain points, etc.

The community is very active and welcomes feedback.  Thanks for taking the time.

On Jan 8, 2011, at 7:55 PM, Paul Pak wrote:

> Hi all,
> 
> After using Cassandra some time, I had some comments on Cassandra and
> hope they spark productive conversation on the list.  They are meant
> only as constructive feedback as a user of Cassandra.  While there are
> many things great about Cassandra, I still feel that the current
> implementation has two major issues that are limiting it's ability to be
> used in production.  There are so many little gotchas that come up which
> most people don't find out about until you get through most of the
> implementation.  Most of the gotchas, I can live with, but the following
> items seem like too heavy a cost to me.
> 
> 1) If you have a result set with thousands of results, like an inbox,
> there is no way to efficiently handle the pages <- 1 2 3 4 5 6 7 8 9 10
> -> except by creating additional data structures on a materialized
> view.  But that means you can only get paged views on materialized
> views.  If you were to add constraints, all the paging functionality no
> longer works.  This is a basic functionality that many, many
> applications need.  Essentially it means that we can only perform the
> most basic queries in Cassandra and secondary indexes and super columns
> are near useless.  Super Columns are useless for doing complex queries
> because of a lack of secondary indexes and the fact that it needs to
> deserialize the entire row to work with it.  Regular CF's are no good
> too for queries with constraints because the paging no longer works
> since there is no materialized view.  There is no way to get the 800th
> record in a result set without getting ALL the data up to the 800th
> record.  That is crazy!  Cassandra desperately needs an efficient
> capability to return a result set by specifying a start_column by record
> number, not key.
> 
> 2) Lack of operational support features.  For instance, no capability to
> manage Cassandra's usage of disk space on nodes.  The fact that an admin
> cannot specify where data goes or how to handle hot data, or gracefully
> stop handling writes to nodes is a fundamental problem with the
> partitioning strategy in my opinion.  I believe the entire partitioning
> strategy needs to be revisited and probably rewritten to include
> capabilities to accept administrator input on how to handle the data
> (i.e. directories, machines, etc.), easily support moving data and
> specifying where it should go, how many replicas, etc.  As it is, it is
> just not flexible enough.   What if you have particularly hot data and
> want to replicate it a dozen times to service read requests faster?  If
> a node runs out of space for sstables, I still want it to be operational
> for read requests, but not write.  When nodes are moved, we need to
> manually run cleanup.  Why is that?  If there is a safety reason, then
> how is an administrator going to know better than Cassandra that the
> operation was successful?
> 
> I know that Cassandra is a work in progress and there are many
> limitations I can live with, but it would be nice to know what the
> roadmap is for the next 12-24 months so we can get an idea of what major
> directions Cassandra is going in so we can plan accordingly.  It would
> be nice if the community could vote of features considered so that the
> devs would have an idea of where the major pain points are for the users
> of Cassandra.  The questions that are especially important are...  what
> feature additions are being considered?  And, what is being done to
> improve cassandra's operations management?  As clusters get larger,
> having it run smoothly is critical for success with Cassandra.  I can
> live with less features, but if I get going and the system falls flat in
> production, that's a terrible situation.  Thanks and Happy New Year all!
> 
> Paul


Mime
View raw message