cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9530) SSTable corruption can trigger OOM
Date Wed, 25 May 2016 15:40:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300235#comment-15300235
] 

Alex Petrov commented on CASSANDRA-9530:
----------------------------------------

One of the failures was the SASI index text and static initialisers.  I couldn't find any
correlation with the current patch, it's fixed now (there was an issue with config being loaded
earlier then the configuration setting for it forced, because of static initialisation). 

The second one is just another issue that our new randomisation caught. There was one more
place where non-seeded random was used, but after several dozen runs I've found another seed
that can reproduce it: 754271160974509L in case you'd like to see in more details. Corruption
in this case makes row to be deserialised as an empty one. It's being normally read, although
fails during the compaction. Sstable doesn't get marked as suspected and compaction fails
all 25 times. 

I've increased the amount of multiplexed runs to 50 (ran both tests). May be we catch some
more things here. Commits in all branches are not squashed to make it simpler to go through.
I've also added more elaborate comments in commit messages.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/9530-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-dtest/]|[multiplexed|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-multiplexed/]|
|[3.0|https://github.com/ifesdjeen/cassandra/tree/9530-3.0]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.0-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.0-dtest/]|[multiplexed|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.0-multiplexed/]|
|[3.7|https://github.com/ifesdjeen/cassandra/tree/9530-3.7]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.7-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.7-dtest/]|[multiplexed|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-3.7-multiplexed/]|

The rest of failures seem unrelated (also, passing locally).

> SSTable corruption can trigger OOM
> ----------------------------------
>
>                 Key: CASSANDRA-9530
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still happily
try to allocate a buffer of that bogus size to read the value, which can easily lead to an
OOM.
> We should probably protect against this. In practice, a given value can be so big since
it's limited by the protocol frame size in the first place. Maybe we could add a max_value_size_in_mb
setting and we'd considered a sstable corrupted if it was containing a value bigger than that.
> I'll note that this ticket would be a good occasion to improve {{BlacklistingCompactionsTest}}.
Typically, it currently generate empty values which makes it pretty much impossible to get
the problem described here. And as described in CASSANDRA-9478, it also doesn't test properly
for thing like early opening of compaction results. We could try to randomize as much of the
parameters of this test as possible to make it more likely to catch any type of corruption
that could happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message