Aaron,
Thanks a lot for your answer,
I had in mind something more generic that I am currently working on.
The idea is to have a tool with GUI screens where you can feed-in the various column families you are using with column (names and values) sizes. Then it will have anotehr screen withh application-aware fields names associated with their value - all defined by the user. Using these parameters, this (modeling) tool should be able to calculate disk usage and hopefully ram usage...
Anyway I am trying to do that for our own case using a simple excel spreadsheet, let you know when it will be ready,
Thanks,
Miriam 

On Thu, Jan 20, 2011 at 11:49 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
Not that I know of, do you have an existing test system you can use as a baseline ? 

For memory have a read of the JVM Heap Size section here http://wiki.apache.org/cassandra/MemtableThresholds
You will also want to have some memory for disk caching and the os. 8 or 12gb feels like a good start.

For disk capacity I just did some regular old guess work, and multipled my number by 1.25 to 
cover the on disk overhead. You also want to avoid using more than 50% of the local disk space, due to 
compaction and the way the disk performance falls away. There is more info available here 

How much throughout do you need? How much redundancy do you need? How much data do you 
plan to store?

Hope that helps
Aaron

On 21 Jan, 2011,at 05:04 AM, Mimi Aluminium <mimi.aluminium@gmail.com> wrote:

Hi,

We are implementing a 'middlewear' layer to an underneath storage and
need to estimate costs for various system configurations.
Specifically, I want to estimate the resources (memory, disk) for our
data model.

Is there a tool that  given certain storage configuration parameters,
column family fields number and sizes and other details, and then
workload-dependant  parameters such as read/write average rates etc. can predict the
resource consumption (i.e, memory, disk)  in an offline mode?

Thanks,
Miriam