cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mattias Larsson <mlars...@yahoo-inc.com>
Subject Re: Hinted Handoff storage inflation
Date Fri, 26 Oct 2012 18:56:08 GMT

On Oct 24, 2012, at 6:05 PM, aaron morton wrote:

> Hints store the columns, row key, KS name and CF id(s) for each mutation to each node.
Where an executed mutation will store the most recent columns collated with others under the
same row key. So depending on the type of mutation hints will take up more space. 
> 
> The worse case would be lots of overwrites. After that writing a small amount of data
to many rows would result in a lot of the serialised space being devoted to row keys, KS name
and CF id.
> 
> 16Gb is a lot though. What was the write workload like ?

Each write is new data only (no overwrites). Each mutation adds a row to one column family
with a column containing about ~100 bytes of data and a new row to another column family with
a SuperColumn containing 2x17KiB payloads. These are sent in batches with several in them,
but I found that the storage overhead was the same regardless of the size of the batch mutation
(i.e., 5 vs 25 mutations made no difference). A total of 1,000,000 mutations like these are
sent over the duration of the test.


> You can get an estimate on the number of keys in the Hints CF using nodetool cfstats.
Also some metrics in the JMX will tell you how many hints are stored. 
> 
>> This has a huge impact on write performance as well.
> Yup. Hints are added to the same Mutation thread pool as normal mutations. They are processed
async to the mutation request but they still take resources to store. 
> 
> You can adjust how long hints a collected for with max_hint_window_in_ms in the yaml
file. 
> 
> How long did the test run for ? 
> 

With both data centers functional, the test takes just a few minutes to run, with one data
center down, 15x the amount of time.

/dml



Mime
View raw message