cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Grotzke (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6487) Log WARN on large batch sizes
Date Mon, 23 Nov 2015 19:21:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013748#comment-15013748
] 

Martin Grotzke edited comment on CASSANDRA-6487 at 11/23/15 7:20 PM:
---------------------------------------------------------------------

[~lyubent] Can you please explain, *why* the batch size is relevant in both szenarios 1) and
2)?

What are the extra costs of a single-partition batch (with multiple statements/inserts), so
that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition) differently handled
than a single-statement not sent as BATCH?

Regarding single-partition batches, my understanding is that they don't cause any extra costs.
This understanding is based e.g. on CASSANDRA-6737 ("A batch statements on a single partition
should not create a new CF object for each update") and on http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
which says (in the paragraph "So when should you use unlogged batches?") {quote}Well customer
id is the partition key, so this will be no more coordination work than a single insert and
it can be done with a single operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches more expensive?


was (Author: martin.grotzke):
[~lyubent] Can you please explain, why the batch size is relevant in both szenarios 1) and
2)?

What are the extra costs of a single-partition batch (with multiple statements/inserts), so
that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition) differently handled
than a single-statement not sent as BATCH?

Regarding single-partition batches, my understanding is that they don't cause any extra costs.
This understanding is based e.g. on CASSANDRA-6737 ("A batch statements on a single partition
should not create a new CF object for each update") and on http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
which says (in the paragraph "So when should you use unlogged batches?") {quote}Well customer
id is the partition key, so this will be no more coordination work than a single insert and
it can be done with a single operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches more expensive?

> Log WARN on large batch sizes
> -----------------------------
>
>                 Key: CASSANDRA-6487
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Patrick McFadin
>            Assignee: Lyuben Todorov
>            Priority: Minor
>             Fix For: 2.0.8, 2.1 beta2
>
>         Attachments: 6487-cassandra-2.0.patch, 6487-cassandra-2.0_v2.patch
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose adding a WARN
log entry if batch sizes go beyond a configurable size. This will give more visibility to
operators on something that can happen on the developer side. 
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it can lead to
node instability.}}
> {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message