We use these type of crashes as indicator that the node might have some hardware errors.

Did you check the ram? (eg memtest86)


Hello All,

JVM is crashing on the cassandra nodes. Re-start doesn't help for long.

Ring information:
$ bin/nodetool -h A ring;
Address         DC          Rack        Status State   Load            Owns    Token
A   DC1         RAC1        Up     Normal  83.65 GB        25.00%  0
B    DC2         RAC1        Down   Normal  170.09 GB       0.00%   1
C   DC1         RAC1        Up     Normal  94.6 GB         25.00%  42535295865117307932921825928971026432
D    DC2         RAC1        Up     Normal  87 GB           0.00%   42535295865117307932921825928971026433
E   DC1         RAC1        Up     Normal  98.05 GB        25.00%  85070591730234615865843651857942052864
F    DC2         RAC1        Up     Normal  95.55 GB        0.00%   85070591730234615865843651857942052865
G   DC1         RAC1        Up     Normal  111.22 GB       25.00%  127605887595351923798765477786913079296
H    DC2         RAC1        Up     Normal  42.05 GB        0.00%   127605887595351923798765477786913079297

10GB Heap space.
Memory on each node = 98 GB
Disk space on each node = 400 GB

JVM Crashes with segmentation faults. Have to do frequent re-starts of the nodes.
Space on B is 170 GB and is getting CPU bound on re-start. but didn't get added to ring for almost 7 hours now.

Java version:
 java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

JVM Crash Error log:

# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x00002abc7ec41fbc, pid=14232, tid=1104185664
# JRE version: 6.0_24-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x30ffbc]
---------------  T H R E A D  ---------------

Current thread (0x000000004d374000):  GCTaskThread [stack: 0x0000000000000000,0x0000000000000000] [id=14243]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000010


Any ideas/suggestions? Any preferred JVM version? There is nothing in cassandra logs to identify what's going on.