cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-9120) OutOfMemoryError when read auto-saved cache (probably broken)
Date Sat, 04 Apr 2015 07:50:33 GMT
Vladimir created CASSANDRA-9120:
-----------------------------------

             Summary: OutOfMemoryError when read auto-saved cache (probably broken)
                 Key: CASSANDRA-9120
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9120
             Project: Cassandra
          Issue Type: Bug
         Environment: Linux
            Reporter: Vladimir
             Fix For: 2.0.14


Found during tests on a 100 nodes cluster. After restart I found that one node constantly
crashes with OutOfMemory Exception. I guess that auto-saved cache was corrupted and Cassandra
can't recognize it. I see that similar issues was already fixed (when negative size of some
structure was read). Does auto-saved cache have checksum? it'd help to reject corrupted cache
at the very beginning.

As far as I can see current code still have that problem. Stack trace is:

INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading saved cache /storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception encountered
during startup
java.lang.OutOfMemoryError: Java heap space
        at java.util.ArrayList.<init>(Unknown Source)
        at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
        at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
        at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
        at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
        at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
        at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
        at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
        at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)

I looked at source code of Cassandra and see:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java

119 int entries = in.readInt();
120 List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<IndexHelper.IndexInfo>(entries);

It seems that value entries is invalid (negative) and it tries too allocate an array with
huge initial capacity and hits OOM. I have deleted saved_cache directory and was able to start
node correctly. We should expect that it may happen in real world. Cassandra should be able
to skip incorrect cached data and run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message