cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Gasparini <marco.gaspar...@competitoor.com>
Subject Re: read failures and high read latency
Date Tue, 27 Aug 2019 07:10:19 GMT
thank you all for answering.

During the peek of workload I'm measuring each nodes statistics, cpu and
I/O statistics included, and I noticed a lot of time spent for IOWAIT
(30-40% of the total cpu usage during the peek).
It seems that the bottleneck is the spinning disk, I'm wondering if I could
try to modify cassandra's configuration in order to improve the RAM
utilisation.

> Not just saturating the drives - note some of those nodes have only 4GB
ram for max heap.
what do you mean?

> Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too
big.  What you need is a blob / object store, not Cassandra.
yes, we understood that but we chose Cassandra for other reasons an now we
need to keep it.

Marco


Il giorno lun 26 ago 2019 alle ore 22:21 Jon Haddad <jon@jonhaddad.com> ha
scritto:

> Not just saturating the drives - note some of those nodes have only 4GB
> ram for max heap.
>
> Cassandra performs very poorly with payloads > 1MB.  20MB is WAY too big.
> What you need is a blob / object store, not Cassandra.
>
> Jon
>
> On Mon, Aug 26, 2019 at 9:45 AM Marc Selwan <marc.selwan@datastax.com>
> wrote:
>
>> * I do queries that reads 3 rows at a time where the total data size is
>> between 5MB and 20MB*
>>
>> There's a good chance you're saturating those drives with payloads like
>> that. Do you happen to have dashboards or capture IO metrics?
>>
>> Best,
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter <https://twitter.com/MarcSelwan>
>>
>> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
>> <http://www.academy.datastax.com> *| *Documentation
>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>  *| *Downloads <http://www.datastax.com/download>
>>
>>
>>
>> On Mon, Aug 26, 2019 at 2:29 AM Marco Gasparini <
>> marco.gasparini@competitoor.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> The error is the following:
>>>
>>> All host(s) tried for query failed. First host tried,
>>> xxx.xxx.xxx.xxx:9042: Host considered as DOWN.
>>>
>>>
>>> in system.log I don't have any exceptions.
>>>
>>> I see 4 odds logs :
>>>
>>> - every period of time StatusLogger logs the table containing "Pool Name
>>>                    Active   Pending      Completed   Blocked  All Time
>>> Blocked"
>>>
>>> - log Maximum memory usage reached (2147483648), cannot allocate chunk
>>> of 1048576
>>>
>>> - logDroppedMessages READ messages were dropped in last 5000 ms: 0
>>> internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>> cross-node dropped latency: 5960 ms
>>>
>>> - log Some operations were slow, details available at debug level
>>>
>>>
>>>
>>>
>>> Il giorno lun 26 ago 2019 alle ore 11:15 Inquistive allen <
>>> inquiallen@gmail.com> ha scritto:
>>>
>>>> Hello Marco,
>>>>
>>>> May you pls share error, exception logs seen in system.log files in the
>>>> environment.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, 26 Aug, 2019, 2:40 PM Marco Gasparini, <
>>>> marco.gasparini@competitoor.com> wrote:
>>>>
>>>>> hi everybody,
>>>>>
>>>>> I'm experiencing some read failures and high read latency (watch the
>>>>> attached picture for more detailes).
>>>>>
>>>>> - I have a cluster of 6 nodes with 1.5TB of occupied disk space for
>>>>> each node. Running Cassandra 3.11.4
>>>>>
>>>>> 4 nodes have 32GB of RAM. Cassandra allocations is Xms8G Xmx8G.
>>>>> 2 nodes have 16GB of RAM. Cassndra allocations is Xms4G Xmx4G.
>>>>>
>>>>> Each node has spinning disk.
>>>>>
>>>>> - Some fields from cassandra.yaml configuration:
>>>>>
>>>>> concurrent_reads: 64
>>>>> concurrent_writes: 64
>>>>> concurrent_counter_writes: 64
>>>>>
>>>>> file_cache_size_in_mb: 2048
>>>>>
>>>>> memtable_cleanup_threshold: 0.2
>>>>> memtable_flush_writers: 4
>>>>> memtable_allocation_type: offheap_objects
>>>>>
>>>>> - CQL schema and RF:
>>>>>
>>>>> CREATE KEYSPACE myks WITH replication = {'class':
>>>>> 'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
>>>>> CREATE TABLE myks.mytable (
>>>>>     id bigint,
>>>>>     type text,
>>>>>     page int,
>>>>>     event_datetime timestamp,
>>>>>     agent text,
>>>>>     portion text,
>>>>>     raw text,
>>>>>     status int,
>>>>>     status_code_pass int,
>>>>>     dom bigint,
>>>>>     reached text,
>>>>>     tt text,
>>>>>     PRIMARY KEY ((id, type), page, event_datetime)
>>>>> ) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>     AND comment = ''
>>>>>     AND compaction = {'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>     AND crc_check_chance = 1.0
>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>     AND default_time_to_live = 0
>>>>>     AND gc_grace_seconds = 90000
>>>>>     AND max_index_interval = 2048
>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>     AND min_index_interval = 128
>>>>>     AND read_repair_chance = 0.0
>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>
>>>>>
>>>>> - I do queries that reads 3 rows at a time where the total data size
>>>>> is between 5MB and 20MB.
>>>>>
>>>>>
>>>>> How can I improve the reading performances?
>>>>> I could stand losing some writing speed in order to improve the
>>>>> reading speed.
>>>>>
>>>>> if you need more information, please ask.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Marco
>>>>> [image: grafana_cassandra.png]
>>>>>
>>>>

Mime
View raw message