Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A48517A2F for ; Sun, 5 Apr 2015 14:07:34 +0000 (UTC) Received: (qmail 81449 invoked by uid 500); 5 Apr 2015 14:07:33 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 81409 invoked by uid 500); 5 Apr 2015 14:07:33 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 81397 invoked by uid 99); 5 Apr 2015 14:07:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Apr 2015 14:07:33 +0000 Date: Sun, 5 Apr 2015 14:07:33 +0000 (UTC) From: "Dave Brosius (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9120) OutOfMemoryError when read auto-saved cache (probably broken) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396241#comment-14396241 ] Dave Brosius commented on CASSANDRA-9120: ----------------------------------------- better, but still probably not a great idea to run up to maxMemory. Also this is an all or nothing thing, maybe we should read up to a point. also, i think this should be changed to something rational public volatile int key_cache_keys_to_save = Integer.MAX_VALUE; > OutOfMemoryError when read auto-saved cache (probably broken) > ------------------------------------------------------------- > > Key: CASSANDRA-9120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9120 > Project: Cassandra > Issue Type: Bug > Environment: Linux > Reporter: Vladimir > Assignee: Jeff Jirsa > Fix For: 3.0, 2.0.15, 2.1.5 > > > Found during tests on a 100 nodes cluster. After restart I found that one node constantly crashes with OutOfMemory Exception. I guess that auto-saved cache was corrupted and Cassandra can't recognize it. I see that similar issues was already fixed (when negative size of some structure was read). Does auto-saved cache have checksum? it'd help to reject corrupted cache at the very beginning. > As far as I can see current code still have that problem. Stack trace is: > {code} > INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading saved cache /storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db > ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception encountered during startup > java.lang.OutOfMemoryError: Java heap space > at java.util.ArrayList.(Unknown Source) > at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120) > at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365) > at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119) > at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:262) > at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421) > at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315) > at org.apache.cassandra.db.Keyspace.(Keyspace.java:272) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92) > at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536) > at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261) > at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > {code} > I looked at source code of Cassandra and see: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java > 119 int entries = in.readInt(); > 120 List columnsIndex = new ArrayList(entries); > It seems that value entries is invalid (negative) and it tries too allocate an array with huge initial capacity and hits OOM. I have deleted saved_cache directory and was able to start node correctly. We should expect that it may happen in real world. Cassandra should be able to skip incorrect cached data and run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)