cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8639) Can OOM on CL replay with dense mutations
Date Tue, 01 Dec 2015 15:08:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033829#comment-15033829
] 

T Jake Luciani edited comment on CASSANDRA-8639 at 12/1/15 3:07 PM:
--------------------------------------------------------------------

This isn't related to the ticket but maybe we should fix it as well; I don't see anyplace
we wait for the replay futures to complete before we finish recover().
Both 2.1 code and your patch will exit early before the futures have all finished.  It looks
like the old version only waited when there were more than max outstanding mutations. Which
is also wrong and racy.  We should always wait for the queue to drain completely before the
method exits.

I'm not sure why futures was changed to a deque.  looks like you only use queue methods, but
maybe I missed it?

The only other thing I noticed was in the test you should validate the data test data is not
found after you clear the CF in-case the replay isn't working.



You also have a 2.1 utest failure related to CL not sure if that's related.  org.apache.cassandra.cql3.DropKeyspaceCommitLogRecycleTest.testRecycle
And one dtest failure in 2.1 commitlog_test.TestCommitLog.test_bad_crc

I didn't check the other versions yet


was (Author: tjake):
This isn't related to the ticket but maybe we should fix it as well; I don't see anyplace
we wait for the replay futures to complete before we finish recover().
Both 2.1 code and your patch will exit early before the futures have all finished.  It looks
like the old version only waited when there were more than max outstanding mutations. Which
is also wrong and racy.  We should always wait for the queue to drain completely before the
method exits.

I'm not sure why futures was changed to a deque.  looks like you only use queue methods, but
maybe I missed it?

The only other thing I noticed was in the test you should validate the data test data is not
found after you clear the CF in-case the replay isn't working.



You also have a 2.1 utest failure related to CL not sure if that's related.  org.apache.cassandra.cql3.DropKeyspaceCommitLogRecycleTest.testRecycle
And one dtest failure in 2.1 commitlog_test.TestCommitLog.test_bad_crc

I didn't check the 

> Can OOM on CL replay with dense mutations
> -----------------------------------------
>
>                 Key: CASSANDRA-8639
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8639
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: T Jake Luciani
>            Assignee: Ariel Weisberg
>            Priority: Minor
>             Fix For: 2.1.x
>
>
> If you write dense mutations with many clustering keys, the replay of the CL can quickly
overwhelm a node on startup.  This looks to be caused by the fact we only ensure there are
1000 mutations in flight at a time. but those mutations could have thousands of cells in them.
> A better approach would be to limit the CL replay to the amount of memory in flight using
cell.unsharedHeapSize()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message