cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ>
Subject Re: Ideas for Big Data Support
Date Thu, 09 Jun 2011 15:24:00 GMT
On 6/9/2011 8:40 AM, Edward Capriolo wrote:
> Some of these things are challenges, and a few are being worked on in 
> one way or another.
> 1) Dynamic snitch was implemented to determine slow acting nodes and 
> re-balance load.
> 2) You can budget bootstrap with rsync, as long as you know what data 
> to copy where. 0.7.X made the data moving process more efficient.

Still, moving only 1 TB of data over a T-1 would take 61 days.  Or you 
could ship it in a couple.

> 3) There are many cases where different partition strategies can 
> theoretically be better. The question is for the normal use case what 
> is the best?
> 4) Compressed SSTables is on the way. This will be nice because it can 
> help maximize disk caches.
> 5) Compaction's *are* a good thing. You can already do this by setting 
> compaction thresholds to 0. That is not great because smaller 
> compactions can run really fast and you want those to happen 
> regularly. Another way I take care of this is forcing major 
> compactions on my schedule. This makes it very unlikely that a larger 
> compaction will happen at random during peak time. 0.8.X has 
> multi-threaded compaction and a throttling limit so that looks promising.
> More nodes vs less nodes..+1 more nodes. This does not mean you need 
> to go very small, but the larger disk configurations are just more 
> painful. Unless you can get very/very/very fast disks.

Even with a massive RAID-0?  At some point, the disk I/O throughput 
should be pretty fast where it can compete with cache speeds perhaps?

View raw message