ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matija Polajnar (Jira)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-12375) Inconsistent persistent cache behaviour: containsKey returns false on a key returned by iterator
Date Fri, 15 Nov 2019 11:51:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matija Polajnar updated IGNITE-12375:
-------------------------------------
    Description: 
On a fairly complex spring boot application using embedded Ignite persistent storage, we've
managed (multiple times) to get into a situation where some persistent caches start behaving
weirdly. The symptoms are such: caches' {{iterator()}} method returns the elements we previously
put into caches as expected. {{size()}} also returns the expected value. But {{containsKey(...)}}
and {{get(...)}} return {{false}} and {{null}} respectively for some (or all) keys that are
expected to be in the cache and are even returned by the {{iterator()}}.

The problem never starts occurring mid-run, but always after cluster restarts; not at all
always, and we suspect a necessary precondition is that cache configurations are slightly
changed, like having modified QueryEntities and such. We also suspect this only happens on
single-node clusters, so it might be related to IGNITE-12297, but the workaround that works
for that problem does not fix the problem described here.

The caches in question then cannot be repaired short of destroying and re-creating them and
re-importing data.

 

We tried and failed to reproduce the problem from scratch in a small demo application. We
managed, however, to grab a {{work}} directory from our application after corruption and then
create a demo application with a minimal set of classes needed to demonstrate the issue on
reading (after corruption is already present).

I'm attaching a zip file with the code (along with a maven pom.xml) and the corrupted work
directory. You can directly execute the demo by issuing {{mvn compile exec:java}}, which will
execute the {{care.better.demo.ignitebug.BugApp}} class. In this class there's this method:

{code:java}
    private static void replicateProblem(IgniteCache<Object, Object> cache) {
        int seen = 0;
        Iterator<Cache.Entry<Object, Object>> entryIterator = cache.iterator();
        while (entryIterator.hasNext()) {
            Object key = entryIterator.next().getKey();
            if (!cache.containsKey(key) || cache.get(key) == null) {
                LOG.error("UNSEEN KEY: {}", key);
            } else {
                seen++;
            }
        }
        LOG.info("Size {}, seen {}.", cache.size(), seen);
    }
{code}
 
After execution you will note log records like this one: ERROR care.better.demo.ignitebug.BugApp.replicateProblem
- UNSEEN KEY: QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники',
id='222'}, entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}

We had no success in trying to find any lead while debugging through Ignite source code so
we kindly ask your assistance in hunting down this bug and, until it is fixed, suggesting
any possible work-around should this occur in a production environment (it has not so far)
where it is not practical to dump all data from some cache into a file to be able to destroy,
re-create and re-import it.

  was:
On a fairly complex spring boot application using embedded Ignite persistent storage, we've
managed (multiple times) to get into a situation where some persistent caches start behaving
weirdly. The symptoms are such: caches' {{iterator()}} method returns the elements we previously
put into caches as expected. {{size()}} also returns the expected value. But {{containsKey(x)}}
and {{get(x)}} return {{false}} and {{null}} respectively for some (or all) keys that are
expected to be in the cache and are even returned by the {{iterator()}}.

The problem never starts occurring mid-run, but always after cluster restarts; not at all
always, and we suspect a necessary precondition is that cache configurations are slightly
changed, like having modified QueryEntities and such. We also suspect this only happens on
single-node clusters, so it might be related to IGNITE-12297, but the workaround that works
for that problem does not fix the problem described here.

The caches in question then cannot be repaired short of destroying and re-creating them and
re-importing data.

 

We tried and failed to reproduce the problem from scratch in a small demo application. We
managed, however, to grab a {{work}} directory from our application after corruption and then
create a demo application with a minimal set of classes needed to demonstrate the issue on
reading (after corruption is already present).

I'm attaching a zip file with the code (along with a maven pom.xml) and the corrupted work
directory. You can directly execute the demo by issuing {{mvn compile exec:java}}, which will
execute the {{care.better.demo.ignitebug.BugApp}} class. In this class there's this method:

{code:java}
    private static void replicateProblem(IgniteCache<Object, Object> cache) {
        int seen = 0;
        Iterator<Cache.Entry<Object, Object>> entryIterator = cache.iterator();
        while (entryIterator.hasNext()) {
            Object key = entryIterator.next().getKey();
            if (!cache.containsKey(key) || cache.get(key) == null) {
                LOG.error("UNSEEN KEY: {}", key);
            } else {
                seen++;
            }
        }
        LOG.info("Size {}, seen {}.", cache.size(), seen);
    }
{code}
 
After execution you will note log records like this one: ERROR care.better.demo.ignitebug.BugApp.replicateProblem
- UNSEEN KEY: QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники',
id='222'}, entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}

We had no success in trying to find any lead while debugging through Ignite source code so
we kindly ask your assistance in hunting down this bug and, until it is fixed, suggesting
any possible work-around should this occur in a production environment (it has not so far)
where it is not practical to dump all data from some cache into a file to be able to destroy,
re-create and re-import it.


> Inconsistent persistent cache behaviour: containsKey returns false on a key returned
by iterator
> ------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12375
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12375
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7, 2.7.6
>            Reporter: Matija Polajnar
>            Priority: Major
>         Attachments: ignite-bug.zip
>
>
> On a fairly complex spring boot application using embedded Ignite persistent storage,
we've managed (multiple times) to get into a situation where some persistent caches start
behaving weirdly. The symptoms are such: caches' {{iterator()}} method returns the elements
we previously put into caches as expected. {{size()}} also returns the expected value. But
{{containsKey(...)}} and {{get(...)}} return {{false}} and {{null}} respectively for some
(or all) keys that are expected to be in the cache and are even returned by the {{iterator()}}.
> The problem never starts occurring mid-run, but always after cluster restarts; not at
all always, and we suspect a necessary precondition is that cache configurations are slightly
changed, like having modified QueryEntities and such. We also suspect this only happens on
single-node clusters, so it might be related to IGNITE-12297, but the workaround that works
for that problem does not fix the problem described here.
> The caches in question then cannot be repaired short of destroying and re-creating them
and re-importing data.
>  
> We tried and failed to reproduce the problem from scratch in a small demo application.
We managed, however, to grab a {{work}} directory from our application after corruption and
then create a demo application with a minimal set of classes needed to demonstrate the issue
on reading (after corruption is already present).
> I'm attaching a zip file with the code (along with a maven pom.xml) and the corrupted
work directory. You can directly execute the demo by issuing {{mvn compile exec:java}}, which
will execute the {{care.better.demo.ignitebug.BugApp}} class. In this class there's this
method:
> {code:java}
>     private static void replicateProblem(IgniteCache<Object, Object> cache) {
>         int seen = 0;
>         Iterator<Cache.Entry<Object, Object>> entryIterator = cache.iterator();
>         while (entryIterator.hasNext()) {
>             Object key = entryIterator.next().getKey();
>             if (!cache.containsKey(key) || cache.get(key) == null) {
>                 LOG.error("UNSEEN KEY: {}", key);
>             } else {
>                 seen++;
>             }
>         }
>         LOG.info("Size {}, seen {}.", cache.size(), seen);
>     }
> {code}
>  
> After execution you will note log records like this one: ERROR care.better.demo.ignitebug.BugApp.replicateProblem
- UNSEEN KEY: QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники',
id='222'}, entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}
> We had no success in trying to find any lead while debugging through Ignite source code
so we kindly ask your assistance in hunting down this bug and, until it is fixed, suggesting
any possible work-around should this occur in a production environment (it has not so far)
where it is not practical to dump all data from some cache into a file to be able to destroy,
re-create and re-import it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message