cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub
Date Thu, 19 Oct 2017 14:06:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211081#comment-16211081
] 

Joel Knighton commented on CASSANDRA-13873:
-------------------------------------------

You're correct that cancelling will also finish the txn and allow operations to select and
reference canonical sstables. In the specific repro that Jake provided, which is the case
of multiple scrubs over the same cfs (an admittedly somewhat artificial case), we'll try to
select and reference canonical sstables in the snapshot before cancelling the original scrub
compaction, so the new scrubs will hang until the original scrub finishes.

That'd be great if you could review. I'm admittedly very unfamiliar with this part of the
code, so I expect my initial patch is a rough sketch of the eventual solution.

As far as criticality goes, I could go either way. I know of no situations that this causes
data loss or permanent deadlocks at this time, but it can potentially cause operations referencing
canonical sstables to hang for long periods of time.

> Ref bug in Scrub
> ----------------
>
>                 Key: CASSANDRA-13873
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: T Jake Luciani
>            Assignee: Joel Knighton
>            Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't happen on 3.0.X.
 I'm not sure if/if not this happens with compactions too but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 NoSpamLogger.java:97
- Spinning trying to capture readers [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*

> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message