cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Urbański (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11209) SSTable ancestor leaked reference
Date Fri, 26 Feb 2016 17:41:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169389#comment-15169389
] 

Jan Urbański edited comment on CASSANDRA-11209 at 2/26/16 5:40 PM:
-------------------------------------------------------------------

OK, so we've been trying to avoid running repairs on nodes that already have a repair session
going, but it's not enough.

Correct me if I'm wrong: say you have A, B and C in the ring, with RF=2. if you run a repair
on node A, it will schedule a validation job on both B (its replica) and A (itself). After
Merkle trees are returned, it'll compare them and stream the differences. Now if you check
for running repairs on node B, you won't get any (the repair session is on node A). But scheduling
a repair on B causes it to submit a validation job on itself, and there's one running already,
so the "multiple repairs" exception gets thrown.

I reproduced this with a local 4 node cluster: ran {{nodetool repair -inc -pr -par -local
keyspace}} on one and then the same command on its replica, while it was running a validation
job. The repair on the replica immediately errored out with the "multiple repairs" exception.

This sounds like you should not only avoid scheduling repairs on a node that's already running
them, but also on both its adjacent nodes, in order to avoid hitting the SSTable leak bug.


was (Author: wulczer):
OK, so we've been trying to avoid running repairs on nodes that already have a repair session
going, but it's not enough.

Correct me if I'm wrong: say you have A, B and C in the ring, with RF=2. if you run a repair
on node A, it will schedule a validation job on both B (its replica) and A (itself). After
Merkle trees are returned, it'll compare them and stream the differences. Now if you check
for running repairs on node B, you won't get any (the repair session is on node A). But scheduling
a repair on B causes it to submit a validation job on itself, and there's one running already,
so the "multiple repairs" exception gets thrown.

This sounds like you should not only avoid scheduling repairs on a node that's already running
them, but also on both its adjacent nodes, in order to avoid hitting the SSTable leak bug.

> SSTable ancestor leaked reference
> ---------------------------------
>
>                 Key: CASSANDRA-11209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11209
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Jose Fernandez
>            Assignee: Marcus Eriksson
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy from [~jjirsa].
We've been running 4 clusters without any issues for many months until a few weeks ago we
started scheduling incremental repairs every 24 hours (previously we didn't run any repairs
at all).
> Since then we started noticing big discrepancies in the LiveDiskSpaceUsed, TotalDiskSpaceUsed,
and actual size of files on disk. The numbers are brought back in sync by restarting the node.
We also noticed that when this bug happens there are several ancestors that don't get cleaned
up. A restart will queue up a lot of compactions that slowly eat away the ancestors.
> I looked at the code and noticed that we only decrease the LiveTotalDiskUsed metric in
the SSTableDeletingTask. Since we have no errors being logged, I'm assuming that for some
reason this task is not getting queued up. If I understand correctly this only happens when
the reference count for the SStable reaches 0. So this is leading us to believe that something
during repairs and/or compactions is causing a reference leak to the ancestor table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message