ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Paschenko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-2294) Implement SQL DML (insert, update, delete) clauses.
Date Fri, 05 Aug 2016 15:00:23 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409553#comment-15409553
] 

Alexander Paschenko edited comment on IGNITE-2294 at 8/5/16 2:59 PM:
---------------------------------------------------------------------

With respect to what has been done to support modifying operations, JDBC driver has been updated
accordingly.

First, to compare type of query from the string against one requested by JDBC driver (requested
by calling definite operation, say, *executeQuery* or *executeUpdate*), has been introduced
a helper class *JdbcSqlFieldsQuery* that carries additional flag of expected operation type.

Then, to distinguish update cursors from those of result sets, *QueryCursorImpl* received
result type flag as well.

*executeUpdate* and *executeBatch* methods have been properly implemented in *JdbcStatement*
and *JdbcPreparedStatement*, the latter also received its deserved specific handling of arguments.
Currently batch statements are simply executed one by one, which most likely has to be improved
for bulk data loading.

Therefore, after having looked at MySQL and its JDBC driver, I would like to make a proposal
about possible use of *data streamer* to optimize bulk inserts/merges.

First, let's define batch inserts as batch statements in JDBC terms (single query with many
sets of params) and bulk statements (multi row INSERT query, for example). Currently *bulk*
statements are efficiently handled with *IgniteCache.putAll*, as long as in this case the
query arrives to the server in full, one piece. However, the whole point of data streamer
is to handle amount of data that is unknown in advance. Therefore I see *batch* JDBC statements
as a nice abstraction to data streamer, and the use might be as follows:

When the user issues an *addBatch* command on the driver, we fire either individual *IgniteDataStreamer.addData(K,
V)*, or *IgniteDataStreamer.addData(Map)* (could use some splitting into chunks for efficiency
here). When the user calls *executeBatch* command, we flush data streamer.

And use of data streamer might be made optional, say, via parameter of connection string.
If the user opts not to enable data streamer and still uses batch, we could apply the technique
that MySQL uses - namely, *rewrite* multiple items in batch into *single bulk insert*.

For example, suppose we have a prepared statement with query of *insert into Person(_key,
name, surname) values (?, ?, ?)*, and then we supply two sets of arguments via *addBatch*.
Then, when *executeBatch* is fired, we turn it all into *single* query *insert into Person(_key,
name, surname) values (?, ?, ?), (?, ?, ?)* with a single set of arguments (with 3 x 2 = 6
items) that will be handled by *putAll*.

Again, above example is for the case when data streamer is disabled - if it's not, we just
feed batch items to it as described above. And also we could leave one-by-one processing of
batch items as one more option for this behavior.

To sum up: what is proposed is that we add to connection string new param with 2 or 3 possible
values, it will control how batch statements are handled - either items will be processed
individually, or they will be fed to data streamer, or they will be squashed to single query
then processed by *putAll*.

The main downside that I see for this approach is that this logic will have to be reimplemented
in some way in other SQL interfaces of Ignite (.NET, ODBC, Visor (?)).


was (Author: al.psc):
With respect to what has been done to support modifying operations, JDBC driver has been updated
accordingly.

First, to compare type of query from the string against one requested by JDBC driver (requested
by calling definite operation, say, *executeQuery* or *executeUpdate*), has been introduced
a helper class *JdbcSqlFieldsQuery* that carries additional flag of expected operation type.

Then, to distinguish update cursors from those of result sets, *QueryCursorImpl* received
result type flag as well.

*executeUpdate* and *executeBatch* methods have been properly implemented in *JdbcStatement*
and *JdbcPreparedStatement*, the latter also received its deserved specific handling of arguments.
Currently batch statements are simply executed one by one, which most likely has to be improved
for bulk data loading.

Therefore, after having looked at MySQL and its JDBC driver, I would like to make a proposal
about possible use of *data streamer* to optimize bulk inserts/merges.

First, let's define batch inserts as batch statements in JDBC terms (single query with many
sets of params) and bulk statements (multi row INSERT query, for example). Currently *bulk*
statements are efficiently handled with *IgniteCache.putAll*, as long as in this case the
query arrives to the server in full, one piece. However, the whole point of data streamer
is to handle amount of data that is unknown in advance. Therefore I see *batch* JDBC statements
as a nice abstraction to data streamer, and the use might be as follows:

When the user issues an *addBatch* command on the driver, we fire either individual *IgniteDataStreamer.addData(K,
V)*, or *IgniteDataStreamer.addData(Map)* (could use some splitting into chunks for efficiency
here). When the user calls *executeBatch* command, we flush data streamer.

And use of data streamer might be made optional, say, via parameter of connection string.
If the user opts not to enable data streamer and still uses batch, we could apply the technique
that MySQL uses - namely, *rewrite* multiple items in batch into *single bulk insert*.

For example, suppose we have a prepared statement with query of *insert into Person(_key,
name, surname) values (?, ?, ?)*, and then we supply two sets of arguments via *addBatch*.
Then, when *executeBatch* is fired, we turn it all into *single* query "insert into Person(_key,
name, surname) values (?, ?, ?), (?, ?, ?)" with a single set of arguments (with 3 x 2 = 6
items) that will be handled by *putAll*.

Again, above example is for the case when data streamer is disabled - if it's not, we just
feed batch items to it as described above. And also we could leave one-by-one processing of
batch items as one more option for this behavior.

To sum up: what is proposed is that we add to connection string new param with 2 or 3 possible
values, it will control how batch statements are handled - either items will be processed
individually, or they will be fed to data streamer, or they will be squashed to single query
then processed by *putAll*.

The main downside that I see for this approach is that this logic will have to be reimplemented
in some way in other SQL interfaces of Ignite (.NET, ODBC, Visor (?)).

> Implement SQL DML (insert, update, delete) clauses.
> ---------------------------------------------------
>
>                 Key: IGNITE-2294
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2294
>             Project: Ignite
>          Issue Type: Wish
>            Reporter: Sergi Vladykin
>            Assignee: Alexander Paschenko
>             Fix For: 1.7
>
>
> We need to add parsing for all the listed SQL commands and translate them into respective
cache operations (putIfAbstent, put, remove).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message