cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-13873) Ref bug in Scrub
Date Thu, 19 Oct 2017 02:47:00 GMT


Joel Knighton updated CASSANDRA-13873:
    Reproduced In: 3.11.0, 3.10, 4.0  (was: 3.10, 3.11.0, 4.0)
           Status: Patch Available  (was: In Progress)

It looks like this situation can occur when referencing canonical sstables. As far as I can
tell, the issue reproduces only when we have an sstable in a lifecycle transaction with no
referencers other than its selfref. If the lifecycle transaction updates this sstable, we'll
put a new instance of the sstable reader in the tracker. This causes no problems when getting
live sstables, but the canonical sstables can also include sstable readers from the compacting
set. In this case, the sstable reader that got updated will still be in the compacting set,
but we won't be able to reference it when we try to select and reference canonical sstables,
since its instance tidier has run when its last ref was released in the lifecycle transaction.
Note that the global tidier doesn't run, since the updated sstable reader is still referenced.
With the reproduction provided above in the multiple scrub, the scrubs will eventually proceed
once the lifecycle transaction finishes, since it will put an updated sstablereader in the
tracker. If there is a situation where a lifecyce transaction needed to select canonical sstables
to proceed, this could cause a deadlock.

I pushed a branch at [c13873-2.2|]
that implements the simplest fix I can think of. The patch references the original sstables
involved in a lifecycle transaction when we create the transaction, releasing these references
whenever we do postCleanup or cancel an sstable reader from a transaction. I merged this forward
and tests came back clean on all active branches. I'm not sure if there is some existing mechanism
that should cover this case - maybe [~krummas] knows from reviewing [CASSANDRA-9699]?

> Ref bug in Scrub
> ----------------
>                 Key: CASSANDRA-13873
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: T Jake Luciani
>            Assignee: Joel Knighton
>            Priority: Critical
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't happen on 3.0.X.
 I'm not sure if/if not this happens with compactions too but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-] 2017-09-14 15:51:26,722
- Spinning trying to capture readers [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
> *released: [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*

> This released table has a selfRef of 0 but is in the Tracker

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message