cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Estevez <sebastian.este...@datastax.com>
Subject Re: How to measure the write amplification of C*?
Date Thu, 10 Mar 2016 18:52:01 GMT
https://issues.apache.org/jira/browse/CASSANDRA-10805

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Mar 10, 2016 at 1:10 PM, Jeff Ferland <jbf@tubularlabs.com> wrote:

> Compaction logs show the number of bytes written and the level written to.
> Base write load = table flushed to L0.
> Write amplification = sum of all compactions written to disk for the table.
>
> On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu <dikang85@gmail.com> wrote:
>
>> Hi Matt,
>>
>> Thanks for the detailed explanation! Yes, this is exactly what I'm
>> looking for, "write amplification = data written to flash/data written
>> by the host".
>>
>> We are heavily using the LCS in production, so I'd like to figure out the
>> amplification caused by that and see what we can do to optimize it. I have
>> the metrics of "data written to flash", and I'm wondering is there an
>> easy way to get the "data written by the host" on each C* node?
>>
>> Thanks
>>
>> On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkennedy@datastax.com>
>> wrote:
>>
>>> TL;DR - Cassandra actually causes a ton of write amplification but it
>>> doesn't freaking matter any more. Read on for details...
>>>
>>> That slide deck does have a lot of very good information on it, but
>>> unfortunately I think it has led to a fundamental misunderstanding about
>>> Cassandra and write amplification. In particular, slide 51 vastly
>>> oversimplifies the situation.
>>>
>>> The wikipedia definition of write amplification looks at this from the
>>> perspective of the SSD controller:
>>> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value
>>>
>>> In short, write amplification = data written to flash/data written by
>>> the host
>>>
>>> So, if I write 1MB in my application, but the SSD has to write my 1MB,
>>> plus rearrange another 1MB of data in order to make room for it, then I've
>>> written a total of 2MB and my write amplification is 2x.
>>>
>>> In other words, it is measuring how much extra the SSD controller has to
>>> write in order to do its own housekeeping.
>>>
>>> However, the wikipedia definition is a bit more constrained than how the
>>> term is used in the storage industry. The whole point of looking at write
>>> amplification is to understand the impact that a particular workload is
>>> going to have on the underlying NAND by virtue of the data written. So a
>>> definition of write amplification that is a little more relevant to the
>>> context of Cassandra is to consider this:
>>>
>>> write amplification = data written to flash/data written to the database
>>>
>>> So, while the fact that we only sequentially write large immutable
>>> SSTables does in fact mean that controller-level write amplification is
>>> near zero, Compaction comes along and completely destroys that tidy little
>>> story. Think about it, every time a compaction re-writes data that has
>>> already been written, we are creating a lot of application-level write
>>> amplification. Different compaction strategies and the workload itself
>>> impact what the real application-level write amp is, but generally
>>> speaking, LCS is the worst, followed by STCS and DTCS will cause the least
>>> write-amp. To measure this, you can usually use smartctl (may be another
>>> mechanism depending on SSD manufacturer) to get the physical bytes written
>>> to your SSDs and divide that by the data that you've actually logically
>>> written to Cassandra. I've measured (more than two years ago) LCS write amp
>>> as high as 50x on some workloads, which is significantly higher than the
>>> typical controller level write amp on a b-tree style update-in-place data
>>> store. Also note that the new storage engine in general reduces a lot of
>>> inefficiency in the Cassandra storage engine therefore reducing the impact
>>> of write amp due to compactions.
>>>
>>> However, if you're a person that understands SSDs, at this point you're
>>> wondering why we aren't burning out SSDs right and left. The reality is
>>> that general SSD endurance has gotten so good, that all this write amp
>>> isn't really a problem any more. If you're curious to read more about that,
>>> I recommend you start here:
>>>
>>>
>>> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash
>>>
>>> and the paper that article mentions:
>>>
>>> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf
>>>
>>>
>>> Hope this helps.
>>>
>>>
>>> Matt Kennedy
>>>
>>>
>>>
>>> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricardomg@gmail.com>
>>> wrote:
>>>
>>>> This is a good source on Cassandra + write amplification:
>>>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
>>>>
>>>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.lerer@datastax.com>:
>>>>
>>>>> Cassandra should not cause any write amplification. Write amplification
>>>>> appends only when you updates data on SSDs. Cassandra does not update
>>>>> any
>>>>> data in place. Data can be rewritten during compaction but it is never
>>>>> updated.
>>>>>
>>>>> Benjamin
>>>>>
>>>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodrime@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi Dikang,
>>>>> >
>>>>> > I am not sure about what you call "amplification", but as sizes
>>>>> highly
>>>>> > depends on the structure I think I would probably give it a try
>>>>> using CCM (
>>>>> > https://github.com/pcmanus/ccm) or some test cluster with
>>>>> 'production
>>>>> > like'
>>>>> > setting and schema. You can write a row, flush it and see how big
is
>>>>> the
>>>>> > data cluster-wide / per node.
>>>>> >
>>>>> > Hope this will be of some help.
>>>>> >
>>>>> > C*heers,
>>>>> > -----------------------
>>>>> > Alain Rodriguez - alain@thelastpickle.com
>>>>> > France
>>>>> >
>>>>> > The Last Pickle - Apache Cassandra Consulting
>>>>> > http://www.thelastpickle.com
>>>>> >
>>>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikang85@gmail.com>:
>>>>> >
>>>>> > > Hello there,
>>>>> > >
>>>>> > > I'm wondering is there a good way to measure the write
>>>>> amplification of
>>>>> > > Cassandra?
>>>>> > >
>>>>> > > I'm thinking it could be calculated by (size of mutations written
>>>>> to the
>>>>> > > node)/(number of bytes written to the disk).
>>>>> > >
>>>>> > > Do we already have the metrics of "size of mutations written
to the
>>>>> > node"?
>>>>> > > I did not find it in jmx metrics.
>>>>> > >
>>>>> > > Thanks
>>>>> > >
>>>>> > > --
>>>>> > > Dikang
>>>>> > >
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Dikang
>>
>>
>

Mime
View raw message