ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolai Kulagin (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-11087) GridJobCheckpointCleanupSelfTest.testCheckpointCleanup is flaky
Date Thu, 17 Oct 2019 19:54:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954060#comment-16954060
] 

Nikolai Kulagin commented on IGNITE-11087:
------------------------------------------

Because the test task is very short, CheckpointRequestListener catches a message about saving
a checkpoint, and the #onSessionEnd method after the task is finished, work simultaneously.
In one moment task node add sessionId in closedSess map, and the listener finds sessionId
in map. Task node removes the key from keymap for this session and removes checkpoint for
this key.
{code:java}
closedSess.add(ses.getId());

// If on task node.
if (ses.getJobId() == null) {
    Set<String> keys = keyMap.remove(ses.getId());

    if (keys != null) {
        for (String key : keys)
            getSpi(ses.getCheckpointSpi()).removeCheckpoint(key);
    }
}{code}
Listener removes the key from keymap and removes checkpoint too (even if the key was not in
the map).
{code:java}
if (closedSess.contains(sesId)) {
    keyMap.remove(sesId, keys);

    getSpi(req.getCheckpointSpi()).removeCheckpoint(req.getKey());
}{code}
For bugfix need add listener's check for contains key in keymap before removing key. And delete
the checkpoint only if the key is found.
{code:java}
if (closedSess.contains(sesId)) {
    if (keyMap.remove(sesId, keys)) 
        getSpi(req.getCheckpointSpi()).removeCheckpoint(req.getKey());
}
{code}
After fixing a new bug appears.

Between create new keySet and add checkpoint key in the listener,
{code:java}
    Set<String> old = keyMap.putIfAbsent(sesId, (CheckpointSet)(keys = new CheckpointSet(ses)));

    if (old != null)
        keys = old;
}
<-------------- here
keys.add(req.getKey());
{code}
task node adds a session in closedSess map, remove empty keySet for session, but not found
no one key (because the listener has not added key yet), and don't remove checkpoint.
{code:java}
Set<String> keys = keyMap.remove(ses.getId());

if (keys != null) {
    for (String key : keys){code}
Listener after added key did not find this key in keyMap, and did not remove checkpoint.
{code:java}
if (closedSess.contains(sesId)) {
    if (keyMap.remove(sesId, keys)){code}
 

> GridJobCheckpointCleanupSelfTest.testCheckpointCleanup is flaky
> ---------------------------------------------------------------
>
>                 Key: IGNITE-11087
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11087
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Nikolai Kulagin
>            Assignee: Nikolai Kulagin
>            Priority: Minor
>              Labels: MakeTeamcityGreenAgain
>         Attachments: #removeCheckpoint is called once more.txt, #removeCheckpoint isn't
called.txt
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The method of remove a checkpoint is sometimes not called or is called once more. Test
has a very low fail rate, 1 per 366 runs on [TeamCity|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-7655052229521669617&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E]
and 1 per 412 on TC Bot. On local machine approximately 1 failure per 100 runs. Logs in the
attachment.
> Test is flaky for a long time. Before replacing IP Finder in IGNITE-10555, test was slower,
which made fail rate even less.
>  
> {code:java}
> [2019-01-25 14:49:03,050][ERROR][main][root] Test failed.
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.failNotEquals(Assert.java:329)
> at junit.framework.Assert.assertEquals(Assert.java:78)
> at junit.framework.Assert.assertEquals(Assert.java:234)
> at junit.framework.Assert.assertEquals(Assert.java:241)
> at org.apache.ignite.internal.GridJobCheckpointCleanupSelfTest.testCheckpointCleanup(GridJobCheckpointCleanupSelfTest.java:88)
> at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2088)
> at java.lang.Thread.run(Thread.java:748){code}
>  
> [^#removeCheckpoint isn't called.txt]
> ^_____________________________________________________________________^
>  
> {code:java}
> [2019-01-25 14:50:03,282][ERROR][main][root] Test failed.
> junit.framework.AssertionFailedError: expected:<-1> but was:<0>
>  at junit.framework.Assert.fail(Assert.java:57)
>  at junit.framework.Assert.failNotEquals(Assert.java:329)
>  at junit.framework.Assert.assertEquals(Assert.java:78)
>  at junit.framework.Assert.assertEquals(Assert.java:234)
>  at junit.framework.Assert.assertEquals(Assert.java:241)
>  at org.apache.ignite.internal.GridJobCheckpointCleanupSelfTest.testCheckpointCleanup(GridJobCheckpointCleanupSelfTest.java:88)
>  at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2088)
>  at java.lang.Thread.run(Thread.java:748){code}
> [^#removeCheckpoint is called once more.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message