cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Lee <patrickclee0...@gmail.com>
Subject Re: Constant blocking read repair for such a tiny table
Date Wed, 16 Oct 2019 19:16:21 GMT
we do have otc_coalescing_strategy, we did run into that long while back
were we see better performance with this off.
and most recently, disk_access_mode to mmap_index_only
as we have a few clusters where we would experience a lot more disk IO
causing high load, high cpu and so latencies were crazy high.  setting this
to mmap_index_only we've seen a lot better overall performance.

just haven't seen this constant rate of read repairs.



On Wed, Oct 16, 2019 at 12:57 PM ZAIDI, ASAD <az192g@att.com> wrote:

> Wondering if you’ve  disabled  otc_coalescing_strategy  CASSANDRA-12676
> <https://issues.apache.org/jira/browse/CASSANDRA-12676> since you’ve
> upgraded from 2.x?  also if you found luck by  increasing
> native_transport_max_threads  to address blocked NTRs (CASSANDRA-11363)?
>
> ~Asad
>
>
>
>
>
>
>
> *From:* Patrick Lee [mailto:patrickclee0207@gmail.com]
> *Sent:* Wednesday, October 16, 2019 12:22 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Constant blocking read repair for such a tiny table
>
>
>
> haven't really figured this out yet.  it's not a big problem but it is
> annoying for sure! the cluster was upgraded from 2.1.16 to 3.11.4.  now my
> only thing is i'm not sure if had this type of behavior before the
> upgrade.  i'm leaning toward a no based on my data but i'm just not 100%
> sure.
>
>
>
> just 1 table, out of all the ones on the cluster has this behavior. repair
> has been run few times via reaper.  even did a nodetool compact on the
> nodes (since this table is like 1GB per node..) . just don't see why there
> would be any inconsistency that would trigger read repair.
>
>
>
> any insight you may have would be appreciated!  the real thing that
> started this digging into the cluster was during some stress test
> application team complained about high latency (30ms at p98).  this cluster
> is oversized already for this use case with only 14GB of data per node,
> there is more than enough ram so all the data is basically cached in ram.
> the only thing that stands out is this crazy read repair.  so this read
> repair may not be my root issue but definitely shouldn't be happening like
> this.
>
>
>
> the vm's..
>
> 12 cores
>
> 82GB ram
>
> 1.2TB local ephemeral ssd's
>
>
>
> attached the info from 1 of the nodes.
>
>
>
> On Tue, Oct 15, 2019 at 2:36 PM Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
> Hello Patrick,
>
>
>
> Still in trouble with this? I must admit I'm really puzzled by your issue.
> I have no real idea of what's going on. Would you share with us the output
> of:
>
>
>
> - nodetool status <keyspace>
>
> - nodetool describecluster
>
> - nodetool gossipinfo
>
> - nodetool tpstats
>
>
>
> Also you said the app is running for a long time, with no changes. What
> about Cassandra? Any recent operations?
>
>
>
> I hope that with this information we might be able to understand better
> and finally be able to help.
>
>
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France / Spain
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=fVfIU9D3R0bW8yLjJ1FIqRU5_r1p-MRImGQGnYTbC08&s=mAnyac8IpTL1FtjLk1K-KLtsRU2iUY3gu6ekhYNzvwQ&e=>
>
>
>
> Le ven. 4 oct. 2019 à 00:25, Patrick Lee <patrickclee0207@gmail.com> a
> écrit :
>
> this table was actually leveled compaction before, just changed it to size
> tiered yesterday while researching this.
>
>
>
> On Thu, Oct 3, 2019 at 4:31 PM Patrick Lee <patrickclee0207@gmail.com>
> wrote:
>
> its not really time series data.   and it's not updated very often, it
> would have some updates but pretty infrequent. this thing should be super
> fast, on avg it's like 1 to 2ms p99 currently but if they double - triple
> the traffic on that table latencies go upward to 20ms to 50ms.. the only
> odd thing i see is just that there are constant read repairs that follow
> the same traffic pattern on the reads, which shows constant writes on the
> table (from the read repairs), which after read repair or just normal full
> repairs (all full through reaper, never ran any incremental repair) i would
> expect it to not have any mismatches.  the other 5 tables they use on the
> cluster can have the same level traffic all very simple select from table
> by partition key which returns a single record
>
>
>
> On Thu, Oct 3, 2019 at 4:21 PM John Belliveau <belliveau.john@gmail.com>
> wrote:
>
> Hi Patrick,
>
>
>
> Is this time series data? If so, I have run into issues with repair on
> time series data using the SizeTieredCompactionStrategy. I have had
> better luck using the TimeWindowCompactionStrategy.
>
>
>
> John
>
>
>
> Sent from Mail
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=fVfIU9D3R0bW8yLjJ1FIqRU5_r1p-MRImGQGnYTbC08&s=aGRvgHbrlgJNg9TcQ959QphH99zLmUSpMeAgJo-Ptx4&e=>
> for Windows 10
>
>
>
> *From: *Patrick Lee <patrickclee0207@gmail.com>
> *Sent: *Thursday, October 3, 2019 5:14 PM
> *To: *user@cassandra.apache.org
> *Subject: *Constant blocking read repair for such a tiny table
>
>
>
> I have a cluster that is running 3.11.4 ( was upgraded a while back from
> 2.1.16 ).  what I see is a steady rate of read repair which is about 10%
> constantly on only this 1 table.  Repairs have been run (actually several
> times).  The table does not have a lot of writes to it so after repair, or
> even after a read repair I would expect it to be fine.  the reason i'm
> having to dig into this so much is for the fact that under a much large
> traffic load than their normal traffic, latencies are higher than the app
> team wants
>
>
>
> I mean this thing is tiny, it's a 12x12 cluster but this 1 table is like
> 1GB per node on disk.
>
>
>
> the application team is doing reads at LOCAL_QUORUM and I can simulate
> this on that cluster by running a query using quorum and/or local_quorum
> and in the trace can see every time running the query it comes back with a
> DigestMismatchException no matter how many times I run it. that record
> hasn't been updated by the application for several months.
>
>
>
> repairs are scheduled and run every 7 days via reaper, recently in the
> past week this table has been repaired at least 3 times.  every time there
> are mismatches and data streams back and forth but yet still a constant
> rate of read repairs.
>
>
>
> curious if anyone has any recommendations to look info further or have
> experienced anything like this?
>
>
>
> this node has been up for 24 hours.. this is the netstats for read repairs
>
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 7481
> Mismatch (Blocking): 11425375
> Mismatch (Background): 17
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         0           1232         0
> Small messages                  n/a         0      395903678         0
> Gossip messages                 n/a         0         603746         0
>
>
>
> example of the schema... some modifications have been made to reduce
> read_reapair and speculative_retry while troubleshooting..
>
> CREATE TABLE keyspace.table1 (
>
>     item bigint,
>
>     price int,
>
>     start_date timestamp,
>
>     end_date timestamp,
>
>     created_date timestamp,
>
>     cost decimal,
>
>     list decimal,
>
>     item_id int,
>
>     modified_date timestamp,
>
>     status int,
>
>     PRIMARY KEY ((item, price), start_date, end_date)
>
> ) WITH CLUSTERING ORDER BY (start_date ASC, end_date ASC)
>
>     AND read_repair_chance = 0.0
>
>     AND dclocal_read_repair_chance = 0.0
>
>     AND gc_grace_seconds = 864000
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
>
>     AND comment = ''
>
>     AND compaction = { 'class' :
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold' : 32, 'min_threshold' : 4 }
>
>     AND compression = { 'chunk_length_in_kb' : 4, 'class' :
> 'org.apache.cassandra.io.compress.LZ4Compressor' }
>
>     AND default_time_to_live = 0
>
>     AND speculative_retry = 'NONE'
>
>     AND min_index_interval = 128
>
>     AND max_index_interval = 2048
>
>     AND crc_check_chance = 1.0
>
>     AND cdc = false
>
>     AND memtable_flush_period_in_ms = 0;
>
>
>
>

Mime
View raw message