ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-6939) Exclude false owners from the execution plan based on query response
Date Thu, 16 Nov 2017 15:46:00 GMT
Alexey Goncharuk created IGNITE-6939:
----------------------------------------

             Summary: Exclude false owners from the execution plan based on query response
                 Key: IGNITE-6939
                 URL: https://issues.apache.org/jira/browse/IGNITE-6939
             Project: Ignite
          Issue Type: Task
      Security Level: Public (Viewable by anyone)
            Reporter: Alexey Goncharuk


This is related to IGNITE-6858, the fix in the ticket can be improved.

The scenario leading to the issue is as follows:
1) Node A has partition 1 as owning
2) Node B has local partition map which has partition 1 on node A as owning
3) Topology change is triggered which would move partition 1 from A to another node, topology
version is X
4) A transaction is started on node B on topology X
5) Partition is rebalanced and node A moves partition 1 to RENTING and then to EVICTED state,
node A updates it's local partition map.
6) A new topology change is triggered
7) Node A sends partition map (transitively) to the node B, but since there is a pending exchange,
node B ignores the updated map and still thinks that A owns partition 1 [1]
8) transaction attempts to execute an SQL query against partition 1 on node A and retries
infinitely

[1] The related code is in GridDhtPartitionTopologyImpl#update(AffinityTopologyVersion, GridDhtPartitionFullMap,
CachePartitionFullCountersMap, Set, AffinityTopologyVersion)
{code}
if (stopping || !lastTopChangeVer.initialized() ||
    // Ignore message not-related to exchange if exchange is in progress.
    (exchangeVer == null && !lastTopChangeVer.equals(readyTopVer)))
    return false;
{code}

There are two possibilities to fix this:
1) Make all updates to partition map in a single thread, then we will not need update sequences
and then we can update local partition map even when there is a pending exchange (this is
a relatively big, but useful change)
2) Make a change in SQL query execution so that if a node cannot reserve a partition, do not
map the partition to this node on the same topology version anymore (a quick fix)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message