cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Durity, Sean R" <SEAN_R_DUR...@homedepot.com>
Subject RE: [EXTERNAL] fine tuning for wide rows and mixed worload system
Date Fri, 11 Jan 2019 15:04:43 GMT
I will start – knowing that others will have additional help/questions.

What heap size are you using? Sounds like you are using the CMS garbage collector. That takes
some arcane knowledge and lots of testing to tune. I would start with G1 and using ½ the
available RAM as the heap size. I would want 32 GB RAM as a minimum on the hosts.

Spinning disks are a problem, too. Can you tell if the IO is getting overwhelmed? SSDs are
much preferred.

Read before write is usually an anti-pattern for Cassandra. From your queries, it seems you
have a partition key and clustering key. Can you give us the table schema? I’m also concerned
about the IF EXISTS in your delete. I think that invokes a light weight transaction – costly
for performance. Is it really required for your use case?


Sean Durity

From: Marco Gasparini <marco.gasparini@competitoor.com>
Sent: Friday, January 11, 2019 8:20 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] fine tuning for wide rows and mixed worload system

Hello everyone,

I need some advise in order to solve my use case problem. I have already tried some solutions
but it didn't work out.
Can you help me with the following configuration please? any help is very appreciate

I'm using:
- Cassandra 3.11.3
- java version "1.8.0_191"

My use case is composed by the following constraints:
- about 1M reads per day (it is going to rise up)
- about 2M writes per day (it is going to rise up)
- there is a high peek of requests in less than 2 hours in which the system receives half
of all day traffic (500K reads, 1M writes)
- each request is composed by 1 read and 2 writes (1 delete + 1 write)

            * the read query selects max 3 records based on the primary key (select * from
my_keyspace.my_table where pkey = ? limit 3)
            * then is performed a deletion of one record (delete from my_keyspace.my_table
where pkey = ? and event_datetime = ? IF EXISTS)
            * finally the new data is stored (insert into my_keyspace.my_table (event_datetime,
pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...))

- each row is pretty wide. I don't really know the exact size because there are 2 dynamic
text columns that stores data between 1MB to 50MB length each.
  So, reads are going to be huge because I read 3 records of that dimension every time. Writes
are complex as well because each row is that wide.

Currently, I own 3 nodes with the following properties:
- node1:
            * Intel Core i7-3770
            * 2x HDD SATA 3,0 TB
            * 4x RAM 8192 MB DDR3
            * nominative bit rate 175MB/s
            # blockdev --report /dev/sd[ab]
                        RO    RA   SSZ   BSZ   StartSec            Size   Device
                        rw   256   512  4096          0   3000592982016   /dev/sda
                        rw   256   512  4096          0   3000592982016   /dev/sdb

- node2,3:
            * Intel Core i7-2600
            * 2x HDD SATA 3,0 TB
            * 4x RAM 4096 MB DDR3
            * nominative bit rate 155MB/s
            # blockdev --report /dev/sd[ab]
                        RO    RA   SSZ   BSZ   StartSec            Size   Device
                        rw   256   512  4096          0   3000592982016   /dev/sda
                        rw   256   512  4096          0   3000592982016   /dev/sdb

Each node has 2 disks but I have disabled RAID option and I have created a virtual single
disk in order to get much free space.
Can this configuration create issues?

I have already tried some configurations in order to make it work, like:
1) straigthforward attempt
            - default Cassandra configuration (cassandra.yaml)
            - RF=1
            - SizeTieredCompactionStrategy  (write strategy)
            - no row cache (because of wide rows dimension is better to have no row cache)
            - gc_grace_seconds = 1 day (unfortunately, I did no repair schedule at all)
            results:
                        too many timeouts, losing data

2)
            - added repair schedules
            - RF=3 (in order increase reads speed)
            results:
                        - too many timeouts, losing data
                        - high I/O consumption on each nodes (iostat shows 100% in %util on
each nodes, dstat shows hundred of M read for each iteration)
                        - node2 frozen until I stopped data writes.
                        - node3 almost frozen
                        - many panding MutationStage events in TPSTATS in node2
                        - many full GC
                        - many HintsDispatchExecutor events in system.log

actual)
            - added repair schedules
            - RF=3
            - set durable_writes = false in order to speed up writes
            - increased young heap
            - decreased SurviviorRatio in order to get much young size available because of
wide rows data
            - increased from 1 to 3 MaxTenuringThreshold in order to decrease reads latency
            - increased Cassandra's memtable onheap and offheap dimensions beacause of wide
rows data
            - changed memtable_allocation_type to offheap_objects bacause of wide rows data
            results:
                        - better GC performance on nodes1 and node3
                        - still high I/O consumption on each nodes (iostat shows 100% in %util
on each nodes, dstat shows hundred of M read for each iteration)
                        - still node2 completely frozen
                        - many panding MutationStage events in TPSTATS in node2
                        - many HintsDispatchExecutor events in system.log in each nodes


I cannot go to AWS but I can only get dedicated server.
Do you have any suggestions to fine tune the system on this use case?

Thank you
Marco


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.
Mime
View raw message