cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Thompson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8052) OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram data)
Date Wed, 01 Apr 2015 15:22:53 GMT


Philip Thompson commented on CASSANDRA-8052:

I believe the correct operational behavior is to run scrub on the afflicted sstable before
attempting to replace the node. Have you reproduced a similar exception in 2.0 or 2.1? 

[~aweisberg], when you get the chance, can you look over the reporter's concerns for 2.1 and
let me know if you think there's a possible risk there?

> OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram
> -----------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-8052
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: linux
>            Reporter: Matt Byrd
>              Labels: OOM, checksum, corruption, oom, serialization
> We've seen nodes with what are presumably corrupted sstables repeatedly OOM on attempted
startup with such a message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>  at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(

> at$SSTableMetadataSerializer.deserialize(
>  at$SSTableMetadataSerializer.deserialize(
>  at
>  at
>  at
>  at$
>  at java.util.concurrent.Executors$
>  at java.util.concurrent.FutureTask$Sync.innerRun(
>  at
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
>  at java.util.concurrent.ThreadPoolExecutor$
>  at
> {code}
> It's probably not a coincidence that it's throwing an exception here since this seems
to be the first byte of the file read.
> Presumably the correct operational process is just to replace the node, 
> however I was wondering if generally we might want to validate lengths when we deserialise
> This could avoid allocating large byte buffers causing unpredictable OOMs and instead
throw an exception to be handled as appropriate.
> In this particular instance, there is no need for an unduly large size for the estimated
> Admittedly things are slightly different in 2.1, though I suspect a similar thing might
have happened with:
> {code}
>        int numComponents = in.readInt();
>        // read toc
>        Map<MetadataType, Integer> toc = new HashMap<>(numComponents); 
> {code}
> Doing a find usages of DataInputStream.readInt() reveals quite a few places where an
int is read in and then an ArrayList, array or map of that size is created.
> In some cases this size might validly vary over a java int,
> or be in a performance critical or delicate piece of code where one doesn't want such
> Also there are other checksums and mechanisms at play which make some input less likely
to be corrupted.
> However, is it maybe worth a pass over instances of this type of input, to try and avoid
such cases where it makes sense?
> Perhaps there are less likely but worse failure modes present and hidden? 
> E.g if the deserialisation is happens to be for a message sent to some or all nodes say.

This message was sent by Atlassian JIRA

View raw message