Hi Javier,

The only bottleneck in the writes as far as I understand it is the commit log.

Sadly this is somewhat wrong, specially in your case. CPU, network limits can be reached, and other issues, can happen. Plus in your case, using counters, there is way more things involved.

The 2 main things that I saw from your comment are:

- You are using counters. When writing a counter, Apache Cassandra performs a read-before-write. In 3.11.1 there should be a counter cache that you can play with to alleviate impacts of this, but what you want to do with counters generally is to put a buffer somewhere that will count for and send one request of '+5000' instead of 5000 thousands request of +1. The difference should be substencial.
- Given this first consideration, and in general, using HDD is not the best to have good throughput, and makes it almost impossible to reach something close to the ms latency. Having 9 disks will not make each of them faster, but allow more concurrency, so latency will never be 'the best'.

Before making changes in the hardware it is important to understand where the bottleneck comes from:
Ideally, I often recommend dashboards, they allow to spot this kind of things very well.
If no dashboards are available, maybe the logs (specially warn / error / gc) could help, or commands such as 'nodetool cfstats' or 'nodetool tpstats' to build a better understanding.

If the machines and disks can handle it and you want to try it as it is, maybe try to increase the amount of 'concurrent_counter' in cassandra.yaml or increase the cache size, but I am really guessing here.

- I have configured all 3 nodes to act as seeds but I don't think this affects write performance.

No problem

- The hints_directory and the saved_caches_directory use the same drive as the commitlog_directory. The data is in the other 7 drives as I explained earlier. 
 
It should be good, did you check the disks performances / usage? 

Could the saved_cached, specially because of the counters, have a meaningful impact on the write performance? 

It depends on the frequency of the cache being written to the disk is and the size of it.
 

C*heers, 
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-03-02 17:36 GMT+00:00 Javier Pareja <pareja.javier@gmail.com>:
Hi again,

Two more thoughts with respect to my question:
- I have configured all 3 nodes to act as seeds but I don't think this affects write performance.
- The hints_directory and the saved_caches_directory use the same drive as the commitlog_directory. The data is in the other 7 drives as I explained earlier. Could the saved_cached, specially because of the counters, have a meaningful impact on the write performance?
- If more nodes are needed for whatever reason, would a layer of virtualization on top of each machine help. Each virtual machine will have assigned dedicated drives (there are plenty of them) and only share the CPU and RAM.

The only bottleneck in the writes as far as I understand it is the commit log. Shall I create RAID0 (for speed) or install an SSD just for the commitlog?

Thanks,
Javier


F Javier Pareja

On Fri, Mar 2, 2018 at 12:21 PM, Javier Pareja <pareja.javier@gmail.com> wrote:
Hello everyone,

I have configured a Cassandra cluster with 3 nodes, however I am not getting the write speed that I was expecting. I have tested against a counter table because it is the bottleneck of the system.
So with the system iddle I run the attached sample code (very simple async writes with a throttle) against an schema with RF=2 and a table with SizeTieredCompactationStrategy.

The speeds that I get are around 65k updates-writes/second and I was hoping for at least 150k updates-writes/second. Even if I run the test in 2 machines in parallel, the execution is 35k updates-writes/second in each. I have executed the test in the nodes themselves (1 and 2 of the 3 nodes).

The nodes are fairly powerful. Each has the following configuration running Cassandra 3.11.1
- RAM: 256GB
- HDD Disks: 9 (7 configured for cassandra data, 1 for the OS and 1 configured for cassandra commits)
- CPU: 8 processors with hyperthreading => 16 processors

The RAM, CPU and HDDs are far from being maxed out when running the tests.

The test command line class uses two parameters: max executions and parallelism. Parallelism is the max number of AsyncExecutions running in parallel. Any other execution will have to wait for available slots.
I tried increasing the parallelism (64, 128, 256...) but the results are the same, 128 seems enough.

Table definition:
CREATE TABLE counttest (
key_column
bigint,
cluster_column
int,
count1_column counter,
count2_column counter,
count3_column counter,
count4_column counter,
count5_column counter,
PRIMARY KEY ((key_column),cluster_column)
);

Write test data generation (from the class attached). Each insert is prepared with uniform random values from below:
            long key_column = getRandom(0, 5000000);
            int cluster_column = getRandom(0, 4096);
            long count1_column = getRandom(0, 10);
            long count2_column = getRandom(0, 10);
            long count3_column = getRandom(0, 10);
            long count4_column = getRandom(0, 10);
            long count5_column = getRandom(0, 10);


I suspect that we took the wrong approach when designing the hardware: Should we have used more nodes and less drives per node? If this is the case, I am trying to understand why or if there is any change that we could do to the configuration (other than getting more nodes) to improve that.

Will an SSD dedicated for the commit log improve things dramatically?


Best Regards,
Javier