ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Kasnacheev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-7540) Sequential checkpoints cause overwrite of already cleaned & freed offheap page
Date Thu, 25 Jan 2018 16:14:00 GMT
Ilya Kasnacheev created IGNITE-7540:
---------------------------------------

             Summary: Sequential checkpoints cause overwrite of already cleaned & freed
offheap page
                 Key: IGNITE-7540
                 URL: https://issues.apache.org/jira/browse/IGNITE-7540
             Project: Ignite
          Issue Type: Bug
          Components: persistence
    Affects Versions: 2.4
            Reporter: Ilya Kasnacheev
            Assignee: Alexey Goncharuk


The sequence of events as follows:

in GridCacheProcessor.onExchangeDone(), {color:#660e7a}sharedCtx{color}.database().waitForCheckpoint({color:#008000}"caches
stop"{color}) is peformed and then cache is destroyed and all its pages are freed and cleared
asynchronously.

However, it is entirely possible that after waitForCheckpoint(), next checkpoint will start
immediately. It is typical when a lot of data being loaded into Ignite, leading to rapid checkpoint
buffer depletion, as well as with artificially increased checkpoint frequency, as used in
reproducer.

Then, checkpointer will save (overwrite) metadata page:
{code:java}
    at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1330)
    at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:428)
    at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:422)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:375)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:163)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:2309)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2088)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2013)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    at java.lang.Thread.run(Thread.java:748){code}
This will happen after cache is already destroyed and even after the page is already zeroed
by PageMemoryImpl$ClearSegmentRunnable.run().

Then, some new cache is being created, and in GridCacheOffheapManager$GridCacheDataStore.getOrAllocatePartitionMetas(),
pageMem.acquirePage() will return this page, expected zeroed, but actually containing metadata
for old cache's partition. Then, type == PageIO.T_PART_META check will return true and the
following exception is issued, leading to cache state inconsistency and data loss:
{code:java}
Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is
corrupted)
    at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83)
    at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95)
    at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:175)
    at org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.<init>(FreeListImpl.java:370)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.<init>(GridCacheOffheapManager.java:932)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:929)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1295)
    at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:344)
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3191)
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2571)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2096)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:140)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:397)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:302)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:59)
    at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:89)
    ... 6 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message