ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Ledkov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-3558) Affinity task hangs when Collision SPI produces a lot of job rejections & Failover SPI produces many attempts
Date Thu, 07 Sep 2017 12:45:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155188#comment-16155188
] 

Taras Ledkov edited comment on IGNITE-3558 at 9/7/17 12:44 PM:
---------------------------------------------------------------

[Tests results|https://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&tab=projectOverview&branch_Ignite20Tests=pull%2F1326%2Fhead]
are OK with me.

[~vozerov], please take a look.


was (Author: tledkov-gridgain):
Waits for [tests results|https://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&tab=projectOverview&branch_Ignite20Tests=pull%2F1326%2Fhead]

> Affinity task hangs when Collision SPI produces a lot of job rejections & Failover
SPI produces many attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-3558
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3558
>             Project: Ignite
>          Issue Type: Bug
>          Components: compute
>            Reporter: Taras Ledkov
>            Assignee: Taras Ledkov
>             Fix For: 2.3
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The test to reproduce:
> {{IgniteCacheLockPartitionOnAffinityRunWithCollisionSpiTest.testJobFinishing}}
> *Root cause*
> {{GridJobExecuteResponse}} isn't set from target node because there is a confusion with
{{GridJobWorker}} instances in the {{CollisionContext}}.
> *Suggestion*
> The method {{GridJobProcessor.CollisionJobContext.cancel()}}
> use {{passiveJobs.remove(jobWorker.getJobId(), jobWorker)}}. 
> *passiveJobs* is a ConcurrentHashMap and {{GridJobWorker.equals()}} implements as a equation
of jobId.
> So, when two thread try to cancel the two workers with *the same jobIds* we have the
case:
> - thread0 remove jobWorker0 & cancel jobWorker0.
> - thread0 put jobWorker1 (because jobWorker0 already removed);
> - thread1: (has a copy of jobWorker0) and try to cancel it.
> - thread1: remove jobWorker1 instead of jobWorker0 (because jobId is used to identify);
> - thread1: doesn't send ExecuteResponse because jobWorker0 has been canceled.
> *Proposal*
> Try to use system default equals for the GridJobWorker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message