ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Batch DML queries design discussion
Date Thu, 08 Dec 2016 12:49:54 GMT
Sergi,

If user call single *execute() *operation, than most likely it is not
batching. We should not rely on strange case where user perform batching
without using standard and well-adopted batching JDBC API. The main problem
with streamer is that it is async and hence break happens-before guarantees
in a single thread: SELECT after INSERT might not return inserted value.

Honestly, I do not really understand why we are trying to re-invent a
bicycle here. There is standard API - let's just use it and make flexible
enough to take advantage of IgniteDataStreamer if needed.

Is there any use case which is not covered with this solution? Or let me
ask from the opposite side - are there any well-known JDBC drivers which
perform batching/streaming from non-batched update statements?

Vladimir.

On Thu, Dec 8, 2016 at 3:38 PM, Sergi Vladykin <sergi.vladykin@gmail.com>
wrote:

> Vladimir,
>
> I see no reason to forbid Streamer usage from non-batched statement
> execution.
> It is common that users already have their ETL tools and you can't be sure
> if they use batching or not.
>
> Alex,
>
> I guess we have to decide on Streaming first and then we will discuss
> Batching separately, ok? Because this decision may become important for
> batching implementation.
>
> Sergi
>
> 2016-12-08 15:31 GMT+03:00 Andrey Gura <agura@apache.org>:
>
> > Alex,
> >
> > In most cases JdbcQueryTask should be executed locally on client node
> > started by JDBC driver.
> >
> > JdbcQueryTask.QueryResult res =
> >     loc ? qryTask.call() :
> > ignite.compute(ignite.cluster().forNodeId(nodeId)).call(qryTask);
> >
> > Is it valid behavior after introducing DML functionality?
> >
> > In cases when user wants to execute query on specific node he should
> > fully understand what he wants and what can go in wrong way.
> >
> >
> > On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko
> > <alexander.a.paschenko@gmail.com> wrote:
> > > Sergi,
> > >
> > > JDBC batching might work quite differently from driver to driver. Say,
> > > MySQL happily rewrites queries as I had suggested in the beginning of
> > > this thread (it's not the only strategy, but one of the possible
> > > options) - and, BTW, would like to hear at least an opinion about it.
> > >
> > > On your first approach, section before streamer: you suggest that we
> > > send single statement and multiple param sets as a single query task,
> > > am I right? (Just to make sure that I got you properly.) If so, do you
> > > also mean that API (namely JdbcQueryTask) between server and client
> > > should also change? Or should new API means be added to facilitate
> > > batching tasks?
> > >
> > > - Alex
> > >
> > > 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vladykin@gmail.com>:
> > >> Guys,
> > >>
> > >> I discussed this feature with Dmitriy and we came to conclusion that
> > >> batching in JDBC and Data Streaming in Ignite have different semantics
> > and
> > >> performance characteristics. Thus they are independent features (they
> > may
> > >> work together, may separately, but this is another story).
> > >>
> > >> Let me explain.
> > >>
> > >> This is how JDBC batching works:
> > >> - Add N sets of parameters to a prepared statement.
> > >> - Manually execute prepared statement.
> > >> - Repeat until all the data is loaded.
> > >>
> > >>
> > >> This is how data streamer works:
> > >> - Keep adding data.
> > >> - Streamer will buffer and load buffered per-node batches when they
> are
> > big
> > >> enough.
> > >> - Close streamer to make sure that everything is loaded.
> > >>
> > >> As you can see we have a difference in semantics of when we send data:
> > if
> > >> in our JDBC we will allow sending batches to nodes without calling
> > >> `execute` (and probably we will need to make `execute` to no-op here),
> > then
> > >> we are violating semantics of JDBC, if we will disallow this behavior,
> > then
> > >> this batching will underperform.
> > >>
> > >> Thus I suggest keeping these features (JDBC Batching and JDBC
> > Streaming) as
> > >> separate features.
> > >>
> > >> As I already said they can work together: Batching will batch
> parameters
> > >> and on `execute` they will go to the Streamer in one shot and Streamer
> > will
> > >> deal with the rest.
> > >>
> > >> Sergi
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> > >>
> > >>> Hi Alex,
> > >>>
> > >>> To my understanding there are two possible approaches to batching in
> > JDBC
> > >>> layer:
> > >>>
> > >>> 1) Rely on default batching API. Specifically
> > >>> *PreparedStatement.addBatch()* [1]
> > >>> and others. This is nice and clear API, users are used to it, and
> it's
> > >>> adoption will minimize user code changes when migrating from other
> JDBC
> > >>> sources. We simply copy updates locally and then execute them all at
> > once
> > >>> with only a single network hop to servers. *IgniteDataStreamer* can
> be
> > used
> > >>> underneath.
> > >>>
> > >>> 2) Or we can have separate connection flag which will move all
> > >>> INSERT/UPDATE/DELETE statements through streamer.
> > >>>
> > >>> I prefer the first approach
> > >>>
> > >>> Also we need to keep in mind that data streamer has poor performance
> > when
> > >>> adding single key-value pairs due to high overhead on concurrency and
> > other
> > >>> bookkeeping. Instead, it is better to pre-batch key-value pairs
> before
> > >>> giving them to streamer.
> > >>>
> > >>> Vladimir.
> > >>>
> > >>> [1]
> > >>> https://docs.oracle.com/javase/8/docs/api/java/sql/
> > PreparedStatement.html#
> > >>> addBatch--
> > >>>
> > >>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko <
> > >>> alexander.a.paschenko@gmail.com> wrote:
> > >>>
> > >>> > Hello Igniters,
> > >>> >
> > >>> > One of the major improvements to DML has to be support of batch
> > >>> > statements. I'd like to discuss its implementation. The suggested
> > >>> > approach is to rewrite given query turning it from few INSERTs
into
> > >>> > single statement and processing arguments accordingly. I suggest
> this
> > >>> > as long as the whole point of batching is to make as little
> > >>> > interactions with cluster as possible and to make operations as
> > >>> > condensed as possible, and in case of Ignite it means that we
> should
> > >>> > send as little JdbcQueryTasks as possible. And, as long as a query
> > >>> > task holds single query and its arguments, this approach will
not
> > >>> > require any changes to be done to current design and won't break
> any
> > >>> > backward compatibility - all dirty work on rewriting will be done
> by
> > >>> > JDBC driver.
> > >>> > Without rewriting, we could introduce some new query task for
batch
> > >>> > operations, but that would make impossible sending such requests
> from
> > >>> > newer clients to older servers (say, servers of version 1.8.0,
> which
> > >>> > does not know about batching, let alone older versions).
> > >>> > I'd like to hear comments and suggestions from the community.
> Thanks!
> > >>> >
> > >>> > - Alex
> > >>> >
> > >>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message