cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jji...@gmail.com>
Subject Re: Wrong concistency level
Date Tue, 01 Jan 2019 20:34:39 GMT
There are two types of read repair

- Blocking/foreground due to reading you consistency level (local_quorum for you) and one
is the responses not matching 

- Probabilistic read repair which queries extra hosts in advance and read repairs them if
they mismatch AFTER responding to the caller/client

You’ve disabled the latter but you can’t disable the former (there’s a proposal to configure
that but I don’t recall if it’s been committed and I’m mobile so not gonna go search
JIRA).

The big mutation is due to large mismatch - probably due to the bounces and reading before
hints replayed (hint throttle is quite low in 3.11, you may want to increase it).


-- 
Jeff Jirsa


> On Jan 1, 2019, at 11:51 AM, Vlad <qa23d-vvd@yahoo.com.invalid> wrote:
> 
> Hi, thanks for answer.
> 
> what I don't understand is:
> 
> - why there are attempts of read repair if repair chances are 0.0 ?
> - what can be cause for big mutation size?
> - why hinted handoffs didn't prevent inconsistency? (because of  big mutation size?)
> 
> Thanks.
> 
> 
> On Tuesday, January 1, 2019 9:41 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
> 
> 
> Read repair due to digest mismatch and speculative retry can both cause some behaviors
that are hard to reason about (usually seen if a host stops accepting writes due to bad disk,
which you havent described, but generally speaking, there are times when reads will block
on writing to extra replicas). 
> 
> The patch from https://issues.apache.org/jira/browse/CASSANDRA-10726 changes this behavior
significantly.
> 
> The last message in this thread (about huge read repair mutations) suggests that your
writes during the bounce got some partitions quite out of sync, and hints aren't replaying
fast enough to fill in the gaps before you read, and the read repair is timing out. The read
repair timing out wouldn't block the read after 10726, so if you're seeing read timeouts right
now, what you probably want to do is run repair or read much smaller pages so that read repair
succeeds, or increase your commitlog segment size from 32M to 128M or so until the read repair
actually succeeds. 
> 
> 
> On Tue, Jan 1, 2019 at 12:18 AM Vlad <qa23d-vvd@yahoo.com.invalid> wrote:
> Hi All and Happy New Year!!!
> 
> This year started with Cassandra 3.11.3 sometimes forces level ALL despite query level
LOCAL_QUORUM (actually there is only one DC) and it fails with timeout.
> 
> As far as I understand, it can be caused by read repair attempts (we see "DigestMismatch"
errors in Cassandra log), but table has no read repair configured:
> 
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.0
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> 
> 
> Any suggestions?
> 
> Thanks.
> 
> 

Mime
View raw message