cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9530) SSTable corruption can trigger OOM
Date Fri, 20 May 2016 16:09:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293614#comment-15293614
] 

Alex Petrov edited comment on CASSANDRA-9530 at 5/20/16 4:09 PM:
-----------------------------------------------------------------

bq. It also failed with an OOM here and here
Great :) that means it worked! 

We're catching {{EOF}} Exceptions now in several places (after changes):
{code}
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:246)
{code}
and
{code}
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:513)
{code}

Although all these EOFs would also be caught given the value is sufficiently large to blow
up the heap (with the new {{max_value_size_in_mb}}). 
While working on improvements, I have also discovered that yet another code path {{SStableScanner}}
(used in for the partition range reads) was not tested. This code path also had out of boundary
exceptions and was throwing similar EOFs. I wrote tests to cover both cases. 

I also wrote the test (not included into the patch) for early corrupting early opened compaction
result, although I'm not sure how useful it is in general. Early opened file will get corrupted,
although there's no way to catch this corruption until the next compaction or sstable read
/ scan occurs. Testing this is also non-trivial, as if we trigger compaction, it runs asyncronously,
so the only reasonable way it is to create a rewriter manually and catch the moment when we
have a new sstable in {{EARLY}} state. Since existing cases already do cover the situation,
I decided to leave it out. I might have missed something, so please let me know.

cassandra.yaml changes are made according to your comments. 
BlackListingCompactionTest.java is adjusted to print the seed and write randomised values.
All invocations of {{AbstractType.readValue()}} are now capped to protect from OOM. A single
instance of {{ByteBufferUtil.readWithShortLength}} is converted to {{skip}} in order to avoid
allocating a throwaway buffer. I thought it's also worth a separate commit.

I'll be putting the tests on CI for a couple of dozen runs (even though I ran them locally
a couple of hundred times already) to verify that nothing was accidentally forgotten.


was (Author: ifesdjeen):
> It also failed with an OOM here and here
Great :) that means it worked! 

We're catching {{EOF}} Exceptions now in several places (after changes):

{code}
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:246)
{code}
and
{code}
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:513)
{code}

Although all these EOFs would also be caught given the value is sufficiently large to blow
up the heap (with the new {{max_value_size_in_mb}}). 
While working on improvements, I have also discovered that yet another code path ({{SStableScanner}}
(used in for the partition range reads) was not tested. This code path also had out of boundary
exceptions and was throwing similar EOFs. I wrote tests to cover both cases. 

I also wrote the test (not included into the patch) for early corrupting early opened compaction
result, although I'm not sure how useful it is in general. Early opened file will get corrupted,
although there's no way to catch this corruption until the next compaction or sstable read
/ scan occurs. Testing this is also non-trivial, as if we trigger compaction, it runs asyncronously,
so the only reasonable way it is to create a rewriter manually and catch the moment when we
have a new sstable in {{EARLY}} state. Since existing cases already do cover the situation,
I decided to leave it out. I might have missed something, so please let me know.

cassandra.yaml changes are made according to your comments. 
BlackListingCompactionTest.java is adjusted to print the seed and write randomised values.
All invocations of {{AbstractType.readValue()}} are now capped to protect from OOM. A single
instance of {{ByteBufferUtil.readWithShortLength}} is converted to {{skip}} in order to avoid
allocating a throwaway buffer. I thought it's also worth a separate commit.

I'll be putting the tests on CI for a couple of dozen runs (even though I ran them locally
a couple of hundred times already) to verify that nothing was accidentally forgotten.

> SSTable corruption can trigger OOM
> ----------------------------------
>
>                 Key: CASSANDRA-9530
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still happily
try to allocate a buffer of that bogus size to read the value, which can easily lead to an
OOM.
> We should probably protect against this. In practice, a given value can be so big since
it's limited by the protocol frame size in the first place. Maybe we could add a max_value_size_in_mb
setting and we'd considered a sstable corrupted if it was containing a value bigger than that.
> I'll note that this ticket would be a good occasion to improve {{BlacklistingCompactionsTest}}.
Typically, it currently generate empty values which makes it pretty much impossible to get
the problem described here. And as described in CASSANDRA-9478, it also doesn't test properly
for thing like early opening of compaction results. We could try to randomize as much of the
parameters of this test as possible to make it more likely to catch any type of corruption
that could happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message