From Leon Oosterwijk <>
Subject Cassandra 2.1.0 Crashes the JVM with OOM with heaps of memory free
Date Fri, 19 Dec 2014 03:55:13 GMT

We have a Cassandra cluster which seems to be struggling a bit. I have one node which crashes
continually, and others which crash sporadically. When they crash it's with a JVM couldn't
allocate memory, even though there's heaps available. I suspect it's because one table which
is very big. (500GB) which has on the order of 500K-700K files in its directory. When I delete
the directory contents on the crashing node and ran a repair, the nodes around this node crashed
while streaming the data. Here is the relevant bits from the crash file and environment.

Any help would be appreciated.

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#  Out of Memory Error (os_linux.cpp:2671), pid=1104, tid=139950342317824
# JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode linux-amd64 compressed
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again

---------------  T H R E A D  ---------------

Current thread (0x00007f4acabb1800):  JavaThread "Thread-13" [_thread_new, id=19171, stack(0x00007f48ba6ca000,0x00007f48ba70b000)]

Stack: [0x00007f48ba6ca000,0x00007f48ba70b000],  sp=0x00007f48ba709a50,  free space=254k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  []  VMError::report_and_die()+0x2ca
V  []  report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType,
char const*)+0x8b
V  []  os::Linux::commit_memory_impl(char*, unsigned long, bool)+0x103
V  []  os::pd_commit_memory(char*, unsigned long, bool)+0xc
V  []  os::commit_memory(char*, unsigned long, bool)+0x2a
V  []  os::pd_create_stack_guard_pages(char*, unsigned long)+0x7f
V  []  JavaThread::create_stack_guard_pages()+0x5e
V  []  JavaThread::run()+0x34
V  []  java_start(Thread*)+0x108
C  []

Memory: 4k page, physical 131988232k(694332k free), swap 37748728k(37748728k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (25.20-b23) for linux-amd64 JRE (1.8.0_20-b26),
built on Jul 30 2014 13:13:52 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

time: Fri Dec 19 14:37:29 2014
elapsed time: 2303 seconds (0d 0h 38m 23s)

OS:Red Hat Enterprise Linux Server release 6.5 (Santiago)

uname:Linux 2.6.32-431.5.1.el6.x86_64 #1 SMP Fri Jan 10 14:46:43 EST 2014 x86_64
libc:glibc 2.12 NPTL 2.12
rlimit: STACK 10240k, CORE 0k, NPROC 8192, NOFILE 65536, AS infinity
load average:4.18 4.79 4.54

MemTotal:       131988232 kB
MemFree:          694332 kB
Buffers:          837584 kB
Cached:         51002896 kB
SwapCached:            0 kB
Active:         93953028 kB
Inactive:       32850628 kB
Active(anon):   70851112 kB
Inactive(anon):  4713848 kB
Active(file):   23101916 kB
Inactive(file): 28136780 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      37748728 kB
SwapFree:       37748728 kB
Dirty:             75752 kB
Writeback:             0 kB
AnonPages:      74963768 kB
Mapped:           739884 kB
Shmem:            601592 kB
Slab:            3460252 kB
SReclaimable:    3170124 kB
SUnreclaim:       290128 kB
KernelStack:       36224 kB
PageTables:       189772 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    169736960 kB
Committed_AS:   92208740 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      492032 kB
VmallocChunk:   34291733296 kB
HardwareCorrupted:     0 kB
AnonHugePages:  67717120 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5056 kB
DirectMap2M:     2045952 kB
DirectMap1G:    132120576 kB

Before you say It's a ulimit issue:
[501]> ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1030998
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 8192
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Here's the filecount on one of the nodes for this very big table:
> ls | wc -l



