cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12649) Add BATCH metrics
Date Thu, 24 Nov 2016 11:05:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692984#comment-15692984
] 

Benjamin Lerer commented on CASSANDRA-12649:
--------------------------------------------

Thanks for the patch [~alwyn].

I spent over a day looking at it and at the problem.

My first concern was that the benchmark was not really telling me the cost of measuring the
total size of the batch mutations. I ended up doing some profiling to get a feeling of the
cost associated to it. For a batch of 100 inserts, on a table with 1 clustering column and
1 other column, the computation took around 0.1% of the CPU time needed to process the request
(taking into account that the JIT was able to profile more aggressively than it will be in
a real production system and that the CPU was doing only few branches misspredictions). In
my opinion this number is still reasonable if we execute only once that operation per write
request.

Looking at the patch, I realized that the {{MutationSizeHistogram}} will only be computed
for batch without conditions. The problem being that for CAS writes the mutations were only
created after the condition had been checked in {{StorageProxy}}. I decided to try to move
the {{MutationSizeHistogram}} metric to the {{StorageProxy}} level. The result is more consistent,
and less surprising for the operators, has it gives the mutation size distribution for all
the write requests. The disavantage is obviously that for some batches we will compute twice
the data size. I guess that we should address that problem at some point.

For the partition per batch metrics, I decided to ignore the CAS batches. As they do not really
belong to the logged or unlogged categories we would have needed another histogram and, as
they cannot be performed on more than one partition, those histograms will not bring any interesting
information.

The result of my experimentations on top of your patch are [here|https://github.com/apache/cassandra/compare/trunk...blerer:12649-3.X].
|[utest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-12649-3.X-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-12649-3.X-dtest/]|
As the {{StorageProxy}} metrics cannot be easily unit tested, I checked them manually via
the JMX console. I also made sure that the changes did not broke {{nodeTool}}. 

[~alwyn] could you check the patch and tell me if you are fine with the changes I made. If
it looks good to you, I will ask [~iamaleksey] to have a look at it to be make sure that I
did not miss anything.





> Add BATCH metrics
> -----------------
>
>                 Key: CASSANDRA-12649
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12649
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Alwyn Davis
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 12649-3.x-v2.patch, 12649-3.x.patch, stress-batch-metrics.tar.gz,
stress-trunk.tar.gz, trunk-12649.txt
>
>
> To identify causes of load on a cluster, it would be useful to have some additional metrics:
> * *Mutation size distribution:* I believe this would be relevant when tracking the performance
of unlogged batches.
> * *Logged / Unlogged Partitions per batch distribution:* This would also give a count
of batch types processed. Multiple distinct tables in batch would just be considered as separate
partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message