ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arseny Kovalchuk <arseny.kovalc...@synesis.ru>
Subject Segmentation fault (JVM crash) while memory restoring on start with native persistance
Date Tue, 26 Dec 2017 09:54:31 GMT
Hi guys.

We've successfully tested Ignite as in-memory solution, it showed
acceptable performance. But we cannot get stable work of Ignite cluster
with native persistence enabled. Our first error we've got is Segmentation
fault (JVM crash) while memory restoring on start.

[2017-12-22 11:11:51,992]  INFO [exchange-worker-#46%ignite-instance-0%]
- Read checkpoint status
[2017-12-22 11:11:51,993]  INFO [exchange-worker-#46%ignite-instance-0%]
- Checking memory state [lastValidPos=FileWALPointer [idx=391,
fileOffset=220593830, len=19573, forceFlush=false],
lastMarked=FileWALPointer [idx=394, fileOffset=38532201, len=19573,
forceFlush=false], lastCheckpointId=8c574131-763d-4cfa-99b6-0ce0321d61ab]
[2017-12-22 11:11:51,993]  WARN [exchange-worker-#46%ignite-instance-0%]
- Ignite node stopped in the middle of checkpoint. Will restore memory
state and finish checkpoint on node start.
[CodeBlob (0x00007f9b58f24110)]
Framesize: 0
BufferBlob (0x00007f9b58f24110) used for StubRoutines (2)
# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (sharedRuntime.cpp:842), pid=221, tid=0x00007f9b473c1ae8
#  fatal error: exception happened outside interpreter, nmethods and vtable
stubs at pc 0x00007f9b58f248f6
# JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build
# Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64
compressed oops)
# Derivative: IcedTea 3.6.0
# Distribution: Custom build (Tue Nov 21 11:22:36 GMT 2017)
# Core dump written. Default location: /opt/ignite/core or core.221
# An error report file with more information is saved as:
# /ignite-work-directory/core_dump_221.log
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla

Please find logs and configs attached.

We deploy Ignite along with our services in Kubernetes (v 1.8) on premises.
Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite version
2.3. Each Pod mounts PersistentVolume backed by CEPH RBD.

We put about 230 events/second into Ignite, 70% of events are ~200KB in
size and 30% are 5000KB. Smaller events have indexed fields and we query
them via SQL.

The cluster is activated from a client node which also streams events into
Ignite from Kafka. We use custom implementation of streamer which uses
cache.putAll() API.

We got the error when we stopped and restarted cluster again. It happened
only on one instance.

The general question is:

*Is it possible to tune up (or implement) native persistence in a way when
it just reports about error in data or corrupted data, then skip it and
continue to work without that corrupted part. Thus it will make the cluster
to continue operating regardless of errors on storage?*

Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​

View raw message