cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Ideas for Big Data Support
Date Thu, 09 Jun 2011 14:40:19 GMT
On Thu, Jun 9, 2011 at 4:23 AM, AJ <> wrote:

> [Please feel free to correct me on anything or suggest other workarounds
> that could be employed now to help.]
> Hello,
> This is purely theoretical, as I don't have a big working cluster yet and
> am still in the planning stages, but from what I understand, while Cass
> scales well horizontally, EACH node will not be able to handle well a data
> store in the terabyte range... for reasons that are understandable such as
> simple hardware and bandwidth limitations.  But, looking forward and pushing
> the envelope, I think there might be ways to at least manage these issues
> until broadband speeds, disk and memory technology catches up.
> The biggest issues with big data clusters that I am currently aware are:
> > disk I/O probs during major compaction and repairs.
> > Bandwidth limitations during new node commissioning.
> Here are a few ideas I've thought of:
> 1.)  Load-balancing:
> During a major compaction or repair or other similar severe performance
> impacting processes, allow the node to broadcast that it is temporarily
> unavailable so requests for data can be sent to other nodes in the cluster.
>  The node could still "wake-up" and pause or cancel it's compaction in the
> case of a failed node whereby there are no other nodes that can provide the
> data requested.  The node could be considered as "degraded" by other nodes,
> rather than down.  (As a matter of fact, a general load-balancing scheme
> could be devised if each node broadcasts it's current load level and maybe
> even hop-count between data centers.)
> 2.)  Data Transplants:
> Since commissioning a new node that is due to receive data in the TB range
> (data xfer could take days or weeks), it would be much more efficient to
> just courier the data.  Perhaps the SSTables (maybe from a snapshot) could
> be transplanted from one production node into a new node to help jump-start
> the bootstrap process.  The new node could sort things out during the
> bootstrapping phase so that it is balanced correctly as if it had started
> out with no data as usual.  If this could cut down on half the bandwidth,
> that would be a great benefit.  However, this would work well mostly if the
> transplanted data came from a keyspace that used a random partitioner;
> coming from an ordered partioner may not be so helpful if the rows in the
> transplanted data would never be used in the new node.
> 3.)  Strategic Partitioning:
> Of course, there are surely other issues to contend with, such as RAM
> requirements for caching purposes.  That may be managed by a partition
> strategy that allows certain nodes to specialize in a certain subset of the
> data, such as geographically or whatever the designer chooses.  Replication
> would still be done as usual but this may help the cache to be better
> utilized by allowing it to focus on the subset of data that comprises the
> majority of the node's data versus a random sampling of the entire cluster.
>  IOW, while a node may specialize in a certain subset and also contain
> replicated rows from outside that subset, it will still only (mostly) be
> queried for data from within it's subset and thus the cache will contain
> mostly data from this special subset which could increase the hit rate of
> the cache.
> This may not be a huge help for TB sized data nodes since the even 32 GB of
> RAM would still be relatively tiny in comparison to the data size, but I
> include it just in case it spurs other ideas.  Also, I do not know how Cass
> decides on which node to query for data in the first place... maybe not the
> best idea.
> 4.)  Compressed Columns:
> Some sort of data compression of certain columns could be very helpful
> especially since text can be compressed to less than 50% if the conditions
> are right.  Overall native disk compression will not help the bandwidth
> issue since the data would be decompressed before transit.  If the data was
> stored compressed, then Cass could even send the data to the client
> compressed so as to offload the decompression to the client.  Likewise,
> during node commission, the data would never have to be decompressed saving
> on CPU and BW.  Alternately, a client could tell Cass to decompress the data
> before transmit if needed.  This, combined with idea #1 (transplants) could
> help speed-up new node bootstraping, but only when a large portion of the
> data consists of very large column values and thus compression is practical
> and efficient.  Of course, the client could handle all the compression today
> without Cass even knowing about it, so building this into Cass would be just
> a convenience, but still nice to have, nonetheless.
> 5.)  Postponed Major Compactions:
> The option to postpone auto-triggered major compactions until a pre-defined
> time of day or week or until staff can do it manually.
> 6.?)  Finally, some have suggested just using more nodes with less data
> storage which may solve most if not all of these problems.  But, I'm still
> fuzzy on that.  The trade-offs would be more infrastructure and maintenance
> costs, higher chance that a server will fail... maybe higher bandwidth
> between nodes due to a large cluster???  I need more clarity on this
> alternative.  Imagine a total data size of 100 TBs and the choice between
> 200 nodes or 50.  What is the cost of more nodes; all things being equal?
> Please contribute additional ideas and strategies/patterns for the benefit
> of all!
> Thanks for listening and keep up the good work guys!
Some of these things are challenges, and a few are being worked on in one
way or another.

1) Dynamic snitch was implemented to determine slow acting nodes and
re-balance load.

2) You can budget bootstrap with rsync, as long as you know what data to
copy where. 0.7.X made the data moving process more efficient.

3) There are many cases where different partition strategies can
theoretically be better. The question is for the normal use case what is the

4) Compressed SSTables is on the way. This will be nice because it can help
maximize disk caches.

5) Compaction's *are* a good thing. You can already do this by setting
compaction thresholds to 0. That is not great because smaller compactions
can run really fast and you want those to happen regularly. Another way I
take care of this is forcing major compactions on my schedule. This makes it
very unlikely that a larger compaction will happen at random during peak
time. 0.8.X has multi-threaded compaction and a throttling limit so that
looks promising.

More nodes vs less nodes..+1 more nodes. This does not mean you need to go
very small, but the larger disk configurations are just more painful. Unless
you can get very/very/very fast disks.

View raw message