cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9120) OutOfMemoryError when read auto-saved cache (probably broken)
Date Sun, 05 Apr 2015 20:16:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396375#comment-14396375
] 

Jeff Jirsa commented on CASSANDRA-9120:
---------------------------------------

[~dbrosius] I don't disagree; I thought about cutting off at freeMemory instead of maxMemory,
but really was just looking for a common sense upper bound above which we can assume the cache
file is invalid and not even try to load it.

I'd agree that for trunk, the right approach would be to load as many of the valid entries
as possible with the memory available, but in 2.0, a safer "if cache file seems invalid because
the number of entries in it wouldn't fit into all of the memory, punt" may be appropriately
conservative for the existing stable users? 

> OutOfMemoryError when read auto-saved cache (probably broken)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-9120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9120
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Linux
>            Reporter: Vladimir
>            Assignee: Jeff Jirsa
>             Fix For: 3.0, 2.0.15, 2.1.5
>
>
> Found during tests on a 100 nodes cluster. After restart I found that one node constantly
crashes with OutOfMemory Exception. I guess that auto-saved cache was corrupted and Cassandra
can't recognize it. I see that similar issues was already fixed (when negative size of some
structure was read). Does auto-saved cache have checksum? it'd help to reject corrupted cache
at the very beginning.
> As far as I can see current code still have that problem. Stack trace is:
> {code}
> INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading saved cache
/storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
> ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception encountered
during startup
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.ArrayList.<init>(Unknown Source)
>         at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
>         at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
>         at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
>         at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
>         at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
>         at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
>         at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
>         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
>         at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> {code}
> I looked at source code of Cassandra and see:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java
> 119 int entries = in.readInt();
> 120 List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<IndexHelper.IndexInfo>(entries);
> It seems that value entries is invalid (negative) and it tries too allocate an array
with huge initial capacity and hits OOM. I have deleted saved_cache directory and was able
to start node correctly. We should expect that it may happen in real world. Cassandra should
be able to skip incorrect cached data and run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message