ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From radha jai <jairadhah...@gmail.com>
Subject Ignite pod keeps crashing and failed to recover the node
Date Tue, 20 Aug 2019 08:26:33 GMT
Ignite has been deployed on the kubernets , there are 3 replicas of server
pod. The pods were up and running fine for 9 days.  We have created 180
inventory tables and 204 transactional tables. The data has been
inserted using the PyIgnite client using the cache.put() method.  This is a
very slow operation because PyIgnite is very slow.  Each insert is
committed one at a time, so it is not able to do bulk-style inserts. The
PyIgnite was inserting about 20 of the inventory tables simultaneously (20
different threads/processes).

The cluster was nowhere stable after 9days, one of the pod crashed and
failed to recover. Below is the error log:
{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed
to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage
[reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null,
startCaches=[BgwService]] java.lang.NullPointerException| at
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)|
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)|
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)|
at java.lang.Thread.run(Thread.java:748)"}
{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite
node stopped in the middle of checkpoint. Will restore memory state and
finish checkpoint on node start."}

The error report file and ignite-config.xml has been attached for your info.

Heap Memory and RAM Configurations are as below on each of the ignite
server container:

Heap Memory: 32gb

RAM: 64GB

Default memory region:

cpu: 4

Persistence volume

wal_storage_size: 10GB

persistence_storage_size: 10GB


Thanks

With Regards

Radha

Mime
View raw message