Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 995C698FB for ; Wed, 14 Dec 2011 03:47:03 +0000 (UTC) Received: (qmail 92157 invoked by uid 500); 14 Dec 2011 03:47:02 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 92053 invoked by uid 500); 14 Dec 2011 03:47:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 92033 invoked by uid 99); 14 Dec 2011 03:47:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 03:47:00 +0000 X-ASF-Spam-Status: No, hits=-2001.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 03:46:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C2B72111230 for ; Wed, 14 Dec 2011 03:46:30 +0000 (UTC) Date: Wed, 14 Dec 2011 03:46:30 +0000 (UTC) From: "Jonathan Ellis (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1226167948.9457.1323834390810.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <871992654.7184.1323808531010.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3624) Hinted Handoff - related OOM MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169056#comment-13169056 ] Jonathan Ellis commented on CASSANDRA-3624: ------------------------------------------- bq. The performance hit would be small since we are doing the hinted handoff throttle delay sleep before sending every mutation anyway True, but this is likely to change (see Jake's comments to CASSANDRA-3554). > Hinted Handoff - related OOM > ---------------------------- > > Key: CASSANDRA-3624 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3624 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Marcus Eriksson > Assignee: Jonathan Ellis > Labels: hintedhandoff > Fix For: 1.0.7 > > Attachments: 3624.txt > > > One of our nodes had collected alot of hints for another node, so when the dead node came back and the row mutations were read back from disk, the node died with an OOM-exception (and kept dying after restart, even with increased heap (from 8G to 12G)). The heap dump contained alot of SuperColumns and our application does not use those (but HH does). > I'm guessing that each mutation is big so that PAGE_SIZE* does not fit in memory (will check this tomorrow) > A simple fix (if my assumption above is correct) would be to reduce the PAGE_SIZE in HintedHandOffManager.java to something like 10 (or even 1?) to reduce the memory pressure. The performance hit would be small since we are doing the hinted handoff throttle delay sleep before sending every *mutation* anyway (not every page), thoughts? > If anyone runs in to the same problem, I got the node started again by simply removing the HintsColumnFamily* files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira