Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 739C067C4 for ; Tue, 12 Jul 2011 13:29:23 +0000 (UTC) Received: (qmail 41143 invoked by uid 500); 12 Jul 2011 13:29:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40999 invoked by uid 500); 12 Jul 2011 13:29:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40991 invoked by uid 99); 12 Jul 2011 13:29:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2011 13:29:20 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.burroughs@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2011 13:29:13 +0000 Received: by vxi40 with SMTP id 40so5104136vxi.31 for ; Tue, 12 Jul 2011 06:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=WugeoMFPtquwAR7xtvKIsYwIsoNb36Bi9VbFORlKk1g=; b=u2e3Y3UthTEJPFuiXn92V9Zzzg3kCWPRRJyx6D1zEEfoK0Cuyow9Oa3bP9AIBit4JT v1tvFFenECYj2AKvJ6lYfqqyXFp1bqZsSLv460r0lxGRdSm37twtz5ZiJk+mGETp+NNC Rb2pNwSrf7LJar+tprK8l2sYV0c/BgvJyZ32g= Received: by 10.52.25.201 with SMTP id e9mr3186112vdg.404.1310477332947; Tue, 12 Jul 2011 06:28:52 -0700 (PDT) Received: from [10.10.17.172] (cl-pat-tr.clearspring.com [8.18.54.254]) by mx.google.com with ESMTPS id bl2sm6182769vbb.9.2011.07.12.06.28.49 (version=SSLv3 cipher=OTHER); Tue, 12 Jul 2011 06:28:50 -0700 (PDT) Message-ID: <4E1C4C11.8000802@gmail.com> Date: Tue, 12 Jul 2011 09:28:49 -0400 From: Chris Burroughs User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Survey: Cassandra/JVM Resident Set Size increase Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org ### Preamble There have been several reports on the mailing list of the JVM running Cassandra using "too much" memory. That is, the resident set size is >>(max java heap size + mmaped segments) and continues to grow until the process swaps, kernel oom killer comes along, or performance just degrades too far due to the lack of space for the page cache. It has been unclear from these reports if there is a pattern. My hope here is that by comparing JVM versions, OS versions, JVM configuration etc., we will find something. Thank you everyone for your time. Some example reports: - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html - https://issues.apache.org/jira/browse/CASSANDRA-2868 - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html For reference theories include (in no particular order): - memory fragmentation - JVM bug - OS/glibc bug - direct memory - swap induced fragmentation - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity. ### Survey 1. Do you think you are experiencing this problem? 2. Why? (This is a good time to share a graph like http://www.twitpic.com/5fdabn or http://img24.imageshack.us/img24/1754/cassandrarss.png) 2. Are you using mmap? (If yes be sure to have read http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have used pmap [or another tool] to rule you mmap and top decieving you.) 3. Are you using JNA? Was mlockall succesful (it's in the logs on startup)? 4. Is swap enabled? Are you swapping? 5. What version of Apache Cassandra are you using? 6. What is the earliest version of Apache Cassandra you recall seeing this problem with? 7. Have you tried the patch from CASSANDRA-2654 ? 8. What jvm and version are you using? 9. What OS and version are you using? 10. What are your jvm flags? 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize) 12. Can you characterise how much GC your cluster is doing? 13. Approximately how many read/writes per unit time is your cluster doing (per node or the whole cluster)? 14. How are you column families configured (key cache size, row cache size, etc.)?