ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: DML data streaming
Date Sat, 11 Feb 2017 00:31:13 GMT
On Fri, Feb 10, 2017 at 12:49 AM, Alexander Paschenko <
alexander.a.paschenko@gmail.com> wrote:

> Dima,
> >
> > There are several ways to handle it. I would check how other databases
> > handle it, maybe we can borrow something. To the least, we should log
> such
> > errors in the log for now.
> >
> Logging errors would mean introducing some kind of stream receiver to
> do that and thus that would be really the same performance penalty for
> the successful operations. I think we should go with that optional
> flag for semantics after all.

I am OK  with introducing some error trap and plug it into configuration
(maybe some interface with onError(...) callback). However, we should never
swallow error, we should always print all errors to the log.  Let's not
worry about the performance in case of errors.

> > You don't have to use _key. Primary key is usually a field in the class,
> so
> > you can use a normal column name. In any case, we should remove any usage
> > of _key before 2.0 is released.
> >
> > Again, if user does not have to specify _key on INSERT, then it is very
> > unclear to me, why user would need to specify _key for UPDATE or DELETE.
> > Something smells here. Can you please provide an example?
> >
> UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
> optimized cases - i.e. those where _key (and possibly _val) are
> explicitly specified by the user thus allowing us to map UPDATE and
> DELETE directly to cache's replace and remove operations without
> messing with entry processors and doing map-reduce SELECT by given
> criteria.
> Say, we have Person { firstName, secondName } with key class Key { id1,
> id2 }
> If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
> there's no need to do any SELECT - we can just call IgniteCache.remove
> on that key.
> But if I say DELETE from Person WHERE id1 = 5 then there's no way to
> avoid MR - we have to find all keys that interest us first by doing
> SELECT as long as we know only partly about what keys the user wants
> to be affected.
> It works in the same way for UPDATE. And I hope that it's clear how
> it's different from INSERT - there's no MR by definition (we don't
> allow INSERT FROM SELECT in streaming mode).

Do we allow INSERT from SELECT in non-streaming mode?

> AGAIN: this all is said only about streaming mode; non streaming mode
> does those optimizations too, but it also allows complex conditions,
> while streaming mode does not allow them to keep things fast and avoid
> MR.
> That's the reason why I suggest that we drop UPDATE and DELETE from
> DML streaming as they mean messing with those soon-hidden columns.
> Still we could optimize stuff like DELETE from Person WHERE id1 = 5
> AND id2 = 6 - query involves ALL fields of key AND compares only for
> equality AND has no complex expressions - we can construct key
> unambiguously and still call remove directly.

Exactly my point. If all key fields are present, we can construct the key
ourselves and still delegate to cache.put(..) or cache.remove(..). For all
cases where all the key fields are not present we should do regular MR. I
am assuming that this applies to UPDATE and DELETE operation. My vote is to
implement this functionality.

> But to me it does not sound like a really great reason to leave UPDATE
> and DELETE in DML - the users will have to write some specific queries
> to use that while all other stuff will just be declined in that mode.
> And, as I said before, UPDATE and DELETE don't probably perfectly fit
> with primary data streamer use cases - after all, modifying existing
> stuff is not what data streamer is about.

I am not sure what this means. We have to work in the same way as regular
RDBMS systems. I would not try to reinvent the bicycle here. All UPDATE,
DELETE, and INSERT operations should be part of DML.

> And regarding hiding columns: it's unclear how things will look like
> for caches like <int, int> when we remove _key and _val as long as
> tables for such cases currently have nothing but those two columns.

Again, think about standard RDBMS systems. None of them have _key or _val,
and therefore neither should we.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message