cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: batch_size_warn_threshold_in_kb
Date Fri, 12 Dec 2014 07:12:50 GMT
Ryan,
Thanks for the quick response.

I did see that jira before posting my question on this list. However, I didn’t see any information
about why 5kb+ data will cause instability. 5kb or even 50kb seems too small. For example,
if each mutation is 1000+ bytes, then with just 5 mutations, you will hit that threshold.

In addition, Patrick is saying that he does not recommend more than 100 mutations per batch.
So why not warn users just on the # of mutations in a batch?

Mohammed

From: Ryan Svihla [mailto:rsvihla@datastax.com]
Sent: Thursday, December 11, 2014 12:56 PM
To: user@cassandra.apache.org
Subject: Re: batch_size_warn_threshold_in_kb

Nothing magic, just put in there based on experience. You can find the story behind the original
recommendation here

https://issues.apache.org/jira/browse/CASSANDRA-6487

Key reasoning for the desire comes from Patrick McFadden:

"Yes that was in bytes. Just in my own experience, I don't recommend more than ~100 mutations
per batch. Doing some quick math I came up with 5k as 100 x 50 byte mutations.

Totally up for debate."

It's totally changeable, however, it's there in no small part because so many people confuse
the BATCH keyword as a performance optimization, this helps flag those cases of misuse.

On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <mohammed@glassbeam.com<mailto:mohammed@glassbeam.com>>
wrote:
Hi –
The cassandra.yaml file has property called batch_size_warn_threshold_in_kb.
The default size is 5kb and according to the comments in the yaml file, it is used to log
WARN on any batch size exceeding this value in kilobytes. It says caution should be taken
on increasing the size of this threshold as it can lead to node instability.

Does anybody know the significance of this magic number 5kb? Why would a higher number (say
10kb) lead to node instability?

Mohammed


--

[datastax_logo.png]<http://www.datastax.com/>

Ryan Svihla

Solution Architect

[twitter.png]<https://twitter.com/foundev>[linkedin.png]<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>


DataStax is the fastest, most scalable distributed database technology, delivering Apache
Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on,
and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax
is the database technology and transactional backbone of choice for the worlds most innovative
companies such as Netflix, Adobe, Intuit, and eBay.

Mime
View raw message