ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-11498) SQL: Rework DML data distribution logic
Date Thu, 07 Mar 2019 06:54:00 GMT
Vladimir Ozerov created IGNITE-11498:
----------------------------------------

             Summary: SQL: Rework DML data distribution logic
                 Key: IGNITE-11498
                 URL: https://issues.apache.org/jira/browse/IGNITE-11498
             Project: Ignite
          Issue Type: Task
          Components: sql
            Reporter: Vladimir Ozerov
             Fix For: 2.8


Current DML implementation has a number of problems:
1) We fetch the whole data set to originator's node. There is "skipDmlOnReducer" flag to avoid
this in some cases, but it is still in experimental state, and is not enabled by default
2) Updates are deadlock-prone: we update entries in batches equal to {{SqlFieldsQuery.pageSize}}.
So we can deadlock easily with concurrent cache operations
3) We have very strange re-try logic. It is not clear why it is needed in the first place
provided that DML is not transactional and no guarantees are needed.

Proposal:
# Implement proper routing logic: if a request could be executed on data nodes bypassing skipping
reducer, do this. Otherwise fetch all data to reducer. This decision should be made in absolutely
the same way as for MVCC (see {{GridNearTxQueryEnlistFuture}} as a starting point)
# Distribute updates to primary data node in batches, but apply them one by one, similar to
data streamer with {{allowOverwrite=false}}. Do not do any partition state or {{AffinityTopologyVersion}}
checks, since DML is not transactional. Return and aggregate update counts back.
# Remove or at least rethink retry logic. Why do we need it in the first place?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message