Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A78BE17D81 for ; Wed, 10 Jun 2015 16:20:01 +0000 (UTC) Received: (qmail 93338 invoked by uid 500); 10 Jun 2015 16:20:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 93306 invoked by uid 500); 10 Jun 2015 16:20:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 93295 invoked by uid 99); 10 Jun 2015 16:20:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2015 16:20:01 +0000 Date: Wed, 10 Jun 2015 16:20:01 +0000 (UTC) From: "Alan Boudreault (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9573) OOM when loading compressed sstables (system.hints) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580728#comment-14580728 ] Alan Boudreault commented on CASSANDRA-9573: -------------------------------------------- No. everything is good on 2.1 > OOM when loading compressed sstables (system.hints) > --------------------------------------------------- > > Key: CASSANDRA-9573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9573 > Project: Cassandra > Issue Type: Bug > Reporter: Alan Boudreault > Priority: Critical > Fix For: 2.2.0 rc2 > > Attachments: hs_err_pid11243.log, java-hints-issue-2015-06-09.snapshot, system.log, yourkit.ss.tar.gz > > > [~andrew.tolbert] discovered an issue while running endurance tests on 2.2. A Node was not able to start and was killed by the OOM Killer. > Briefly, Cassandra use an excessive amount of memory when loading compressed sstables (off-heap?). We have initially seen the issue with system.hints before knowing it was related to compression. system.hints use lz4 compression by default. If we have a sstable of, say 8-10G, Cassandra will be killed by the OOM killer after 1-2 minutes. I can reproduce that bug everytime locally. > * the issue also happens if we have 10G of data splitted in 13MB sstables. > * I can reproduce the issue if I put a lot of data in the system.hints table. > * I cannot reproduce the issue with a standard table using the same compression (LZ4). Something seems to be different when it's hints? > You wont see anything in the node system.log but you'll see this in /var/log/syslog.log: > {code} > Out of memory: Kill process 30777 (java) score 600 or sacrifice child > {code} > The issue has been introduced in this commit but is not related to the performance issue in CASSANDRA-9240: https://github.com/apache/cassandra/commit/aedce5fc6ba46ca734e91190cfaaeb23ba47a846 > Here is the core dump and some yourkit snapshots in attachments. I am not sure you will be able to get useful information from them. > core dump: http://dl.alanb.ca/core.tar.gz > Not sure if this is related, but all dumps and snapshot points to EstimatedHistogramReservoir ... and we can see many javax.management.InstanceAlreadyExistsException: org.apache.cassandra.metrics:... exceptions in system.log before it hangs then crash. > To reproduce the issue: > 1. created a cluster of 3 nodes > 2. start the whole cluster > 3. shutdown node2 and node3 > 4. writes 10-15G of data on node1 with replication factor 3. You should see a lot of hints. > 5. stop node1 > 6. start node2 and node3 > 7. start node1, you should OOM. > //cc [~tjake] [~benedict] [~andrew.tolbert] -- This message was sent by Atlassian JIRA (v6.3.4#6332)