Using the LOG_ONLY mode, I remember having encountered this problem. After the node rebooted and printed an error message, the node could not be started. At that time, I did not reserve the error message. I searched for the source code, which may be one of the two.
1. 'Failed to find checkpoint record at the given WAL pointer'
2. 'on disk, but checkpoint record is missed in WAL '

In the LOG_ONLY mode, it may not start in case of node crash?

------------------ 原始邮件 ------------------
发件人: "Pavel Vinokurov"<>;
发送时间: 2018年5月10日(星期四) 下午5:13
收件人: "user"<>;
主题: Re: Read request response time is unstable, often more than500milliseconds, but the cluster load is small

Please, try to check performance with LOG_ONLY mode.

2018-05-10 12:03 GMT+03:00 NO <>:

I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this parameter, but it will seriously affect the write speed, I do not know what the impact of setting this parameter is, whether it is necessary to set other parameters to increase the write speed?

------------------ 原始邮件 ------------------
发件人: "Pavel Vinokurov"<>;
发送时间: 2018年5月10日(星期四) 下午4:59
收件人: "user"<>;
主题: Re: Read request response time is unstable, often more than 500milliseconds, but the cluster load is small


I see several exceptions in your logs. Probably it causes the slowdown.
>> java.lang.ClassCastException: org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot be cast to org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager

Seems to you have the issue related to that fixed in the 2.5 version.
As workaround you could change WALMode to LOG_ONLY or start ignite with the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true


2018-05-10 5:42 GMT+03:00 NO <>:

Ignite version : 2.4.0

Read operations often exceed 500 milliseconds, but the cluster traffic is very small. I don't know why. Please help me solve this problem. Thank you very much. Here is some configuration information.

8 node : (48 core ,192G RAM, 4TB SSD)
Cluster records : 1.7 billion primary keys , 1.7 billion backup keys
Get requests per second : 100+
requests per second : 400+
Each node occupies more than 500GB of disk space.

2 node :
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:    CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:    7.2.1511
Codename:    Core

6 node:
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:    CentOS
Description:    CentOS release 6.7 (Final)
Release:    6.7
Codename:    Final
The node configuration is as follows
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns=""
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
           <property name="failureDetectionTimeout" value="60000"/>       
        <property name="clientFailureDetectionTimeout" value="60000"/>               
        <property name="segmentationPolicy" value="RESTART_JVM"/>       
        <property name="publicThreadPoolSize" value="64"/>       
        <property name="systemThreadPoolSize" value="64"/>
        <property name="dataStreamerThreadPoolSize" value="64"/>
        <property name="rebalanceThreadPoolSize" value="4" />       
        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="defaultDataRegionConfiguration">
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="name" value="qipu_entity_cache_data_region"/>
                        <property name="initialSize" value="#{10L * 1024 * 1024 * 1024}"/>
                        <property name="maxSize" value="#{100L * 1024 * 1024 * 1024}"/>
                        <property name="persistenceEnabled" value="true"/>
                        <property name="metricsEnabled" value="true"/>
                        <property name="checkpointPageBufferSize" value="#{1 * 1024 * 1024 * 1024}"/>
                <property name="walSegmentSize" value="#{64 * 1024 * 1024}"/>
                <property name="pageSize" value="#{4 * 1024}"/>
                <property name="walSegments" value="#{20}"/>
                <property name="walMode" value="FSYNC"/>
                <property name="metricsEnabled" value="true"/>
                <property name="writeThrottlingEnabled" value="true"/>               
                <property name="checkpointThreads" value="8"/>               
                <property name="walThreadLocalBufferSize" value="#{1 * 1024 * 1024}"/>
        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="dataRegionName" value="qipu_entity_cache_data_region"/>
                <property name="name" value="qipu_entity_cache"/>
                <property name="cacheMode" value="PARTITIONED"/>
                <property name="partitionLossPolicy" value="IGNORE"/>
                <property name="atomicityMode" value="ATOMIC"/>
                <property name="backups" value="1"/>
                <property name="writeSynchronizationMode" value="FULL_SYNC"/>
                <property name="statisticsEnabled" value="true"/>
                <property name="rebalanceBatchSize" value="#{20 * 1024 * 1024}"/>
                <property name="rebalanceThrottle" value="0"/>               
                <property name="rebalanceMode" value="ASYNC"/>           
                <property name="rebalanceBatchesPrefetchCount" value="4"/>               
                <property name="rebalanceTimeout" value="20000"/>               
                <property name="maxConcurrentAsyncOperations" value="#{4 * 500}"/>
        <property name="communicationSpi">
            <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
                <property name="messageQueueLimit" value="20480"/>
        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="forceServerMode" value="true"/>
                <property name="ipFinder">
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        <property name="addresses">
                                <!-- In distributed environment, replace with actual host IP address. -->
        <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
                <constructor-arg type="java.lang.String" value="/home/qipu/production/apache-ignite-2.4.0/config/ignite-log4j2.xml"/>
JVM config
JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=512m"
JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch"
JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC"
JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC"
JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError "
JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work"
JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails"
JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps"
JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps"
JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation"
JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10"
JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M"
JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log"
JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy"
JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100"
node config
fs.file-max = 512000
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.netdev_max_backlog = 4096
net.core.somaxconn = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 10000 65000
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mtu_probing = 1
vm.zone_reclaim_mode = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 500
*       soft    nofile          65535
*       hard    nofile          65535

# End of file
*               soft    nofile             65535
*               hard    nofile             65535
*       soft    nofile          81920
*       hard    nofile          81920
*       soft    nproc           81920
*       hard    nproc           81920
*       soft    core            10240
*       hard    core            10240
*    soft    data       unlimited
*    hard    data       unlimited
*    soft    stack      unlimited
*    hard    stack      unlimited
*    soft    memory     unlimited
*    hard    memory     unlimited
*    soft    cpu        unlimited
*    hard    cpu        unlimited
*    soft    memlock    unlimited
*    hard    memlock    unlimited

* hard memlock      unlimited
* soft memlock      unlimited

client code

        IgniteConfiguration cfg = new IgniteConfiguration();
        TcpDiscoverySpi spi = new TcpDiscoverySpi();

        TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder();

        cfg.setGridLogger(new Slf4jLogger());
        Ignite ignite = Ignition.start(cfg);
        IgniteCache<String, byte[]> igniteCache =

        // get code
【Read operation response time often exceeds 1s

        // put code
        // cache.putAllAsync(map).get(3000);

Attachment is a node's gc log and node log

Please give some suggestions on how to reduce the read operation response time. Thank you.



Pavel Vinokurov



Pavel Vinokurov