cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
Date Wed, 06 Dec 2017 14:36:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280241#comment-16280241
] 

Paulo Motta commented on CASSANDRA-13948:
-----------------------------------------

bq. need to check the trunk patch as well, cancelling "ready to commit" (i hope)

The merge went smoothly, most of the conflicts were related to CASSANDRA-9143, so I [updated|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948#diff-f9c882c974db60a710cf1f195cfdb801R113]
{{CompactionStrategyManagerTest}} to mark a subset of sstables as repaired and pending repair
to make sure sstables are being assigned the correct strategies for repaired and pending repair
sstables.

However, there were 2 test failures in the trunk branch after the merge:
1. [testSetLocalCompactionStrategy|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L163]
2. [disk_balance_bootstrap_test|https://github.com/pauloricardomg/cassandra-dtest/blob/73d7a8e1deb5eab05867d804933621062c2f6762/disk_balance_test.py#L34]

1. was failing [here|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L175]
because {{ALTER TABLE t WITH gc_grace_seconds = 1000}} was causing the manually set compaction
strategy to be replaced by the strategy defined on the schema. After investigation, it turned
out that the disk boundaries were being invalidated due to the schema reload (introduced by
CASSANDRA-9425), and {{maybeReload(TableMetadata)}} was causing the compaction strategies
to be reloaded with the schema settings instead of the manually set settings. In order to
fix this, I split {{maybeReload}} in the original {{maybeReload(TableMetadata)}}, which should
be called externally by {{ColumnFamilyStore}} and only reloads the strategies when the schema
table parameters change, and {{maybeReloadDiskBoundaries}} which is used internally and reloads
the compaction strategies with the same table settings when the disk boundaries are invalidated
[here|https://github.com/apache/cassandra/commit/de5916e7c4f37736d5e1d06f0fc2b9c082b6bb99].

2.since the local ranges are not defined when the bootstrapping node starts, the disk boundaries
are empty, but before CASSANDRA-9425 the boundaries were invalidated during keyspace construction
([here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L388]),
so the correct boundaries were used during streaming. After CASSANDRA-9425  the boundaries
were no longer reloaded during keyspace creation ([here|https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/schema/Schema.java#L138]),
so the empty boundaries were used during streaming and the disks were imbalanced. I had exactly
the same problem on CASSANDRA-14083 ([here|https://issues.apache.org/jira/browse/CASSANDRA-14083?focusedCommentId=16272918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16272918]).
The solution is to invalidate the disk boundaries after the tokens are set during bootstrap
([here|https://github.com/apache/cassandra/commit/a37bbda45142e1b351908a4ff5196eb08e92082b]).

After these two fixes, the tests were passing (failures seem unrelated - test screenshots
from internal CI below):

||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-13948]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:13948]|
|[testall|https://issues.apache.org/jira/secure/attachment/12900864/3.11-13948-testall.png]|[testall|https://issues.apache.org/jira/secure/attachment/12900862/trunk-13948-testall.png]|
|[dtest|https://issues.apache.org/jira/secure/attachment/12900865/3.11-13948-dtest.png]|[dtest|https://issues.apache.org/jira/secure/attachment/12900863/trunk-13948-dtest.png]|

> Reload compaction strategies when JBOD disk boundary changes
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 3.11.x, 4.x
>
>         Attachments: 13948dtest.png, 13948testall.png, 3.11-13948-dtest.png, 3.11-13948-testall.png,
debug.log, dtest13948.png, dtest2.png, threaddump-cleanup.txt, threaddump.txt, trace.log,
trunk-13948-dtest.png, trunk-13948-testall.png
>
>
> The thread dump below shows a race between an sstable replacement by the {{IndexSummaryRedistribution}}
and {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled
frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1,
line=836 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1199
(Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=943
(Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
java.util.Collection, org.apache.cassandra.db.compaction.OperationType, java.lang.Throwable)
@bci=50, line=409 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
@bci=157, line=227 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
@bci=61, line=116 (Compiled frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
@bci=2, line=200 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
@bci=5, line=185 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
@bci=559, line=130 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
@bci=9, line=1420 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
@bci=4, line=250 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() @bci=30,
line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() @bci=4, line=125
(Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Interpreted frame)
>  - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
@bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37,
line=294 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted
frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
@bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted
frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled frame; information
may be imprecise)
>  - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View)
@bci=20, line=268 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, line=265
(Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate,
com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, org.apache.cassandra.db.compaction.OperationType)
@bci=31, line=99 (Compiled frame)
>  - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int)
@bci=84, line=139 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int)
@bci=105, line=119 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
@bci=84, line=265 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted
frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
@bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted
frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> This particular node remain in this state forever, indicating {{LeveledCompactionStrategyTask.getNextBackgroundTask}}
was looping indefinitely.
> What happened is that sstable references were replaced on the tracker by the {{IndexSummaryRedistribution}}
thread, so the {{AbstractCompactionStrategy.getNextBackgroundTask}} could not create the transaction
with the old references, and the {{IndexSummaryRedistribution}} could not update the sstable
reference in the compaction strategy because {{AbstractCompactionStrategy.getNextBackgroundTask}}
was holding the {{CompactionStrategyManager}} lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message