cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Coverston <>
Subject Re: Backups, Snapshots, SSTable Data Files, Compaction
Date Tue, 07 Jun 2011 20:46:48 GMT
Aaron makes a good point, the happiest customers in my opinion are the 
ones that choose nodes on the smaller side, and more of them.

Regarding the working set, I am referring to the OS cache. On linux, 
with JNA, Cassadra utilizes, to great effectiveness, memory mapped files 
and this is where I would expect most of your working set to reside.

The smaller the data set on each node the higher the proportion of CPU 
cycles, disk IO, network bandwidth, and memory you can dedicate to 
working with that data and making it work within your use case.


On 6/7/11 2:15 PM, aaron morton wrote:
> I'd also say consider what happens during maintenance and failure scenarios. Moving 10's
TB around takes a lot longer than 100's GB.
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> On 8 Jun 2011, at 06:40, AJ wrote:
>> Thanks to everyone who responded thus far.
>> On 6/7/2011 10:16 AM, Benjamin Coverston wrote:
>> <snip>
>>> Not to say that there aren't workloads where having many TB/Node doesn't work,
but if you're planning to read from the data you're writing you do want to ensure that your
working set is stored in memory.
>> Thank you Ben.  Can you elaborate some more on the above point?  Are you referring
to the OS's working set or the Cassandra caches?  Why exactly do I need to ensure this?
>> I am also wondering if there is any reason I should segregate my frequently write/read
smallish data set (such as usage statistics) from my bulk mostly read-only data set (static
content) into separate CFs if the schema allows it.  Would this be of any benefit?

Ben Coverston
Director of Operations
DataStax -- The Apache Cassandra Company

View raw message