ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Fedotov <ivanan...@gmail.com>
Subject Re: Apache Ignite 2.7. Last Mile
Date Mon, 03 Dec 2018 10:03:12 GMT
Nikolay,

I think that end-user may face the problem during call IgniteCache#invoke
on a cache with registered continious query if cache's configuration is as
in the failed test: [PARTITIONED, ATOMIC, FULL_SYNCH, 2 backups].

I've found that failure has been introduced by MVCC commit [1]. As I
understand the issue relates to the process of updating metadata, when the
future of binary metadata registration hangs because of an unclear reason.

I don't know if the issue the blocker, but seems it's regression because
the test has been passed on Ignite 2.6

What do you think?

[1]
https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8

пн, 3 дек. 2018 г. в 11:14, Nikolay Izhikov <nizhikov@apache.org>:

> Ivan, please, clarify.
>
> How your investigation are related to 2.7 release?
> Do you think it's a release blocker?
> If yes, please, describe impact to users and how users can reproduce this
> issue.
>
> пн, 3 дек. 2018 г., 9:30 Ivan Fedotov ivanan639@gmail.com:
>
> > I've created the PR <https://github.com/apache/ignite/pull/5550> which
> > includes changes <https://github.com/1vanan/ignite/commits/before-MVCC>
> > just before integration MVCC with Continuous Query and from the TeamCity
> > <
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1
> > >
> > it is clear that before this changes the
> > test testAtomicOnheapTwoBackupAsyncFullSync is green.
> >
> > Also Roman Kondakov gave his view on this problem in the comments
> > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the problem
> > becomes more understandable, but the root reason is still unclear.
> >
> > May be a few of you have any suggestions why hang of threads on the
> binary
> > metadata registration future appears?
> >
> > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <ivanan639@gmail.com>:
> >
> > > Igor, thank you for explanation.
> > >
> > > Now it seems that when the one thread tries to invoke
> > > GridCacheMapEntry#touch, the another one makes
> > > GridCacheProcessor#stopCache. If I am wrong, please feel free to
> correct
> > me.
> > >
> > > But it still does not clear for me why this fail appears after commit
> > > <
> >
> https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8
> >
> > which
> > > is about MVCC. Moreover, NPE appears only with BinaryObjectException,
> and
> > > when the test is green, I can not find NPE in the log.
> > >
> > > Now I tried to run test locally 1000 times on the version before MVCC
> and
> > > could not find error on this concretely case (but it exists the another
> > > one
> > > <
> >
> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426
> >
> > which
> > > is about assertion on received events).
> > >
> > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov
> <kondakov87@mail.ru.invalid
> > >:
> > >
> > >> Nikolay,
> > >>
> > >> I couldn't quickly find the root cause of this problem because I'm not
> > >> an expert in the binary metadata flow. I think community should decide
> > >> whether this is a release blocker or not.
> > >>
> > >>
> > >> --
> > >> Kind Regards
> > >> Roman Kondakov
> > >>
> > >> On 30.11.2018 13:23, Nikolay Izhikov wrote:
> > >> > Hello, Roman.
> > >> >
> > >> > Is this issue blocks the 2.7 release?
> > >> >
> > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov
> kondakov87@mail.ru.invalid
> > :
> > >> >
> > >> >> Hi all!
> > >> >>
> > >> >> I've reproduced this problem locally and attached the log to the
> > ticket
> > >> >> in my comment [1].
> > >> >>
> > >> >> As Igor noted, NPE there is caused by node stop in the end of
the
> > test.
> > >> >> The real problem here seems to be in the binary metadata
> registration
> > >> flow.
> > >> >>
> > >> >>
> > >> >> [1]
> > >> >>
> > >> >>
> > >>
> >
> https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510
> > >> >>
> > >> >> --
> > >> >> Kind Regards
> > >> >> Roman Kondakov
> > >> >>
> > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote:
> > >> >>> Null pointer there due to cache stop. Look at
> > GridCacheContext#cleanup
> > >> >>> (GridCacheContext.java:2050)
> > >> >>> which is called by GridCacheProcessor#stopCache
> > >> >>> (GridCacheProcessor.java:1372)
> > >> >>>
> > >> >>> That's why at the time GridCacheMapEntry#touch
> > >> >> (GridCacheMapEntry.java:5063)
> > >> >>>    invoked there is no eviction manager.
> > >> >>>
> > >> >>> This is a result of "normal" flow because message processing
> doesn't
> > >> >> enter
> > >> >>> cache gate like user API does.
> > >> >>>
> > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <
> nizhikov@apache.org
> > >:
> > >> >>>
> > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack
trace
> > >> attached.
> > >> >>>>
> > >> >>>> I've looked at IGNITE-10376 and can't see any attachments.
> > >> >>>>
> > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov ivanan639@gmail.com:
> > >> >>>>
> > >> >>>>> Igor,
> > >> >>>>> NPE is available in a full log, now I also attached
it in the
> > >> ticket.
> > >> >>>>>
> > >> >>>>> IGNITE-7953
> > >> >>>>> <
> > >> >>>>>
> > >> >>
> > >>
> >
> https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8
> > >> >>>>> was commited on the 15 October. I could not take a
look on the
> > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this
date, because
> > the
> > >> >>>> oldest
> > >> >>>>> test in the history on TC dates 12 November.
> > >> >>>>>
> > >> >>>>> So, I tested it locally and could not reproduce mentioned
error.
> > >> >>>>>
> > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov
Igor <
> > >> gvvinblade@gmail.com>:
> > >> >>>>>
> > >> >>>>>> Ivan,
> > >> >>>>>>
> > >> >>>>>> Could you provide a bit more details?
> > >> >>>>>>
> > >> >>>>>> I don't see any NPE among all available logs.
> > >> >>>>>>
> > >> >>>>>> I don't think the issue is caused by changes in
scope of
> > >> IGNITE-7953.
> > >> >>>>>> The test fails both before
> > >> >>>>>> <
> > >> >>>>>>
> > >> >>
> > >>
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025
> > >> >>>>>>    and after
> > >> >>>>>> <
> > >> >>>>>>
> > >> >>
> > >>
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025
> > >> >>>>>> the
> > >> >>>>>> commit was merged to master with almost the same
stack trace.
> > >> >>>>>>
> > >> >>>>>> Regards,
> > >> >>>>>> Igor
> > >> >>>>>>
> > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov
<
> yzhdanov@apache.org
> > >:
> > >> >>>>>>
> > >> >>>>>>> Vladimir, can you please take a look at
> > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376?
> > >> >>>>>>>
> > >> >>>>>>> --Yakov
> > >> >>>>>>>
> > >> >>>>> --
> > >> >>>>> Ivan Fedotov.
> > >> >>>>>
> > >> >>>>> ivanan639@gmail.com
> > >> >>>>>
> > >>
> > >
> > >
> > > --
> > > Ivan Fedotov.
> > >
> > > ivanan639@gmail.com
> > >
> >
> >
> > --
> > Ivan Fedotov.
> >
> > ivanan639@gmail.com
> >
>


-- 
Ivan Fedotov.

ivanan639@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message