ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Belyak (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-4940) GridCacheWriteBehindStore lose more data then necessary
Date Tue, 11 Apr 2017 07:29:41 GMT
Alexander Belyak created IGNITE-4940:
----------------------------------------

             Summary: GridCacheWriteBehindStore lose more data then necessary
                 Key: IGNITE-4940
                 URL: https://issues.apache.org/jira/browse/IGNITE-4940
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 1.9
            Reporter: Alexander Belyak
            Priority: Minor


Unnecessary data loss happen in case of slowdown or errors in underlying store & populate
new data in cache:
1) Writer add new cache entry and check cache size
2) If cache size > criticalSize (by default criticalSize = 1.5 * cacheSize) - writer will
try to flush single value synchronously
At this point we have:
N flusher threads wich trying to flush data in batch mode
1+ writer threads wich trying to flush single value
Both writer and flusher use updateStore procedure, but if updateStore get Exception from underlying
store it will check cacheSize and if it will be greater chen criticalCacheSize - it log cache
overflow event and return true (as if data was sucessfully stored). Then data will be removed
from writeBehind cache.
Moreower, we can loss not only single value, but 1+ batch if flusher's threads will get store
exception on overflowed cache.
Reproduce:
{panel}
/**
     * Tests that cache would keep values if underlying store fails.
     *
     * @throws Exception If failed.
     */
    private void testStoreFailure(boolean writeCoalescing) throws Exception {
        delegate.setShouldFail(true);

        initStore(2, writeCoalescing);

        Set<Integer> exp;

        try {
            Thread timer = new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        U.sleep(FLUSH_FREQUENCY*2);
                    } catch (IgniteInterruptedCheckedException e) {
                        assertTrue("Timer was interrupted", false);
                    }
                    delegate.setShouldFail(false);
                }
            });
            timer.start();
            exp = runPutGetRemoveMultithreaded(10, 100000);

            timer.join();

            info(">>> There are " + store.getWriteBehindErrorRetryCount() + " entries
in RETRY state");

            // Despite that we set shouldFail flag to false, flush thread may just have caught
an exception.
            // If we move store to the stopping state right away, this value will be lost.
That's why this sleep
            // is inserted here to let all exception handlers in write-behind store exit.
            U.sleep(1000);
        }
        finally {
            shutdownStore();
        }

        Map<Integer, String> map = delegate.getMap();

        Collection<Integer> extra = new HashSet<>(map.keySet());

        extra.removeAll(exp);

        assertTrue("The underlying store contains extra keys: " + extra, extra.isEmpty());

        Collection<Integer> missing = new HashSet<>(exp);

        missing.removeAll(map.keySet());

        assertTrue("Missing keys in the underlying store: " + missing, missing.isEmpty());

        for (Integer key : exp)
            assertEquals("Invalid value for key " + key, "val" + key, map.get(key));
    }
{panel}
Solution: test cache size before inserting new value +
a) with some kind of synchronization to prevent cacheSize growing more then criticalCacheSize
(strong restriction)
b) remove cache size test from updateStore - cache can grow more then cacheCriticalSize in
single point - if we get race on updateCache...
I preferr b becouse of less synchronization pressure (cache can store 1 or 2 extra elements)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message