cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12649) Add BATCH metrics
Date Mon, 07 Nov 2016 09:52:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643663#comment-15643663
] 

Benjamin Lerer commented on CASSANDRA-12649:
--------------------------------------------

[~appodictic] If you have some questions/concerns on some review feedbacks, you should not
hesitate to write them down on the ticket.

I realize now that my feedback was may be not really helpfull. Sorry for that.

My main concern is the fact that the measurement of the mutation will be done on every mutation
and that it involves going through the mutation, to sum up the size of the mutation. On nodes
with heavy writes this might have a non neglectable CPU cost. It is true that we compute the
data size when we serialize the data but, there, we do not have another choice.

Setting up a benchmark for that take some time and unfortunatly I do not have the time to
do it myself.
If I have to do it, I will try to setup a JMH benchmark to check the throughput of the {{dataSize}}
method of random data. It is too difficult to assess that type of stuff on a running Cassandra
due to the JIT and the overall fluctuation in response time of the all database (the impact
of the change might end up being lost in the noise).

Regarding,
{quote}
I am also not fully convinced by the usefullness of Logged / Unlogged Partitions per batch
distribution. Could you explain in more details how it will be usefull for you?
{quote}
I just wanted to understand why such mettric will be usefull.

What I did not see was the end of [~KurtG] comment:
bq. We have seen a lot of users mistakenly batch against multiple partitions.

Then my question is: Is there not a better way of doing it? Should we not have a setting for
rejecting batches against multi partition or a warning?

[~appodictic] My only concern is the quality of what goes into Cassandra. I am not trying
to prevent anybody from contributing. If you do not understand my comments or do not agree
with them will free to write it down on the ticket. My patches do not receive a better treatment
(have a look at CASSANDRA-10707, which took me 8 months), they rarely go in on the first attempt.



 


 



> Add BATCH metrics
> -----------------
>
>                 Key: CASSANDRA-12649
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12649
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Alwyn Davis
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 12649-3.x.patch, stress-batch-metrics.tar.gz, stress-trunk.tar.gz,
trunk-12649.txt
>
>
> To identify causes of load on a cluster, it would be useful to have some additional metrics:
> * *Mutation size distribution:* I believe this would be relevant when tracking the performance
of unlogged batches.
> * *Logged / Unlogged Partitions per batch distribution:* This would also give a count
of batch types processed. Multiple distinct tables in batch would just be considered as separate
partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message