We use these type of crashes as indicator that the node might have some hardware errors.

Did you check the ram? (eg memtest86)


On Wed, Nov 2, 2011 at 2:03 PM, Jahangir Mohammed <md.jahangir27@gmail.com> wrote:
Hello All,

JVM is crashing on the cassandra nodes. Re-start doesn't help for long.

Ring information:
$ bin/nodetool -h A ring;
Address         DC          Rack        Status State   Load            Owns    Token
A   DC1         RAC1        Up     Normal  83.65 GB        25.00%  0
B    DC2         RAC1        Down   Normal  170.09 GB       0.00%   1
C   DC1         RAC1        Up     Normal  94.6 GB         25.00%  42535295865117307932921825928971026432
D    DC2         RAC1        Up     Normal  87 GB           0.00%   42535295865117307932921825928971026433
E   DC1         RAC1        Up     Normal  98.05 GB        25.00%  85070591730234615865843651857942052864
F    DC2         RAC1        Up     Normal  95.55 GB        0.00%   85070591730234615865843651857942052865
G   DC1         RAC1        Up     Normal  111.22 GB       25.00%  127605887595351923798765477786913079296
H    DC2         RAC1        Up     Normal  42.05 GB        0.00%   127605887595351923798765477786913079297

10GB Heap space.
Memory on each node = 98 GB
Disk space on each node = 400 GB

JVM Crashes with segmentation faults. Have to do frequent re-starts of the nodes.
Space on B is 170 GB and is getting CPU bound on re-start. but didn't get added to ring for almost 7 hours now.

Java version:
 java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

JVM Crash Error log:

# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x00002abc7ec41fbc, pid=14232, tid=1104185664
# JRE version: 6.0_24-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x30ffbc]
# If you would like to submit a bug report, please visit:

---------------  T H R E A D  ---------------

Current thread (0x000000004d374000):  GCTaskThread [stack: 0x0000000000000000,0x0000000000000000] [id=14243]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000010


Any ideas/suggestions? Any preferred JVM version? There is nothing in cassandra logs to identify what's going on.