ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: DDL implementation details
Date Fri, 27 Jan 2017 08:07:57 GMT
Denis,

>>> Inventing custom DDL statements is also a dead end.
Questionable. Every database has custom SQL, including DDL, are they all
"dead-ended"? Each database has unique set of feature and unique
architecture. The very goal of custom SQL is to give user control over
these differences. Without it users might easily get stuck with some
architectural limitation or performance issue, which cannot be resolved
with ANSI SQL. For this reason our very goal is not to make migration
strictly transparent - "without a single SQL line change!". It is
impossible (recall widespread enterprise myth that major Hibernate
advantage is ability to "switch" between databases transparently -
bullshit). Our goal is to make migration *smooth *on the one hand, without
loosing Ignite advantages on the other.

>>> Most of us suggest the users never store multiple types (queries
entities) per cache consider this as an anti-pattern.
Personally, I am absolutely comfortable with having multiple entities per
cache. As Sergi mentioned before, it worked fine for years. I am against
*forcing* users to put everything into single cache at the risk of total
SQL rewrite otherwise.

Having said that I do not support idea of having cache-per-table rule
either. This way we are moving from one extreme to another. Both would add
limitations to the product. For instance, as Alex mentioned, empty cache
have serious memory overhead at the moment, and it is not resolved still.
Another problem is memory pools which we expect in AI 2.0 - it will be
difficult to manage memory if you always create separate caches.

I would design it as follows:
1) SCHEMA - logical entity. This is metadata stored somewhere (e.g. in
INFORMATIONAL_SCHEMA cache) which brings database objects together (tables,
views, stored procedures, sequences, security rights, etc.);
2) CREATE CACHE - creates cache;
3) Define several default caches for the most common cases. I see two for
now - PARTITIONED for normal data, REPLICATED for reference data. This way
user will not bother with CREATE CACHE in usual scenarios.

Thoughts?

Vladimir.


On Fri, Jan 27, 2017 at 5:51 AM, Denis Magda <dmagda@apache.org> wrote:

> Agree that schema-per-cache approach is natural for Ignite but will be a
> nightmare for those who migrate from a RDBMS. Inventing  custom DDL
> statements is also a dead end. This won’t make users life and transitions
> from a RDBMS easier.
>
> Let’s to look at this differently. Most of us suggest the users never
> store multiple types (queries entities) per cache consider this as an
> anti-pattern. I do remember that we even brought up an idea to apply
> query-entity-per-cache rule.
>
> Conjuring that Ignite follows query-entity-per-cache rule, DDL statements
> might have the following effect:
>
> CREATE TABLE blablabla {…} - creates both cache and query entity with name
> blablabla.
>
> In SQL queries we don’t need to specify a cache name as a schema name at
> all even during joins thanks to cache-per-query-entity rule. We can figure
> this out automatically.
>
> Executing `CREATE SCHEME myScheme` and using it later in DDL statements
> (USING SCHEME myScheme) we will add the scheme name as an attribute to
> respective query entities or caches. When the scheme name is used in a SQL
> query the engine will look for a valid query entity/cache that has the
> attribute set to this name.
>
> I do realize that this will break the compatibility but we’re approaching
> 2.0. Good chance to make not only DDL but SQL Grid more user friendly.
>
> —
> Denis
>
> > On Jan 17, 2017, at 7:29 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
> >
> > After looking at all the restrictions of "cache==schema" approach, I
> still
> > think that it would be the most natural match for Ignite. All other
> > suggestions are either too limiting, too hard, or not applicable for
> > Ignite.
> >
> > My vote would be to start with "cache==schema" and get some user
> feedback.
> > We can always add "tablespaces" in future, if our users demand it.
> >
> > D.
> >
> > On Tue, Jan 17, 2017 at 1:29 AM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> >> Sergey,
> >>
> >> See inline:
> >>
> >> 2017-01-17 0:50 GMT+03:00 Sergey Kozlov <skozlov@gridgain.com>:
> >>
> >>> For the approach schema==cache it is not clear how it works for some
> >> cases
> >>> (especialy if SQL and regular cache operations are mixed):
> >>>
> >>> 1. I want to store two tables in same cache (same schema) and both
> tables
> >>> have identical structure like id Integer, name String.
> >>> How they will be processed for regular cache operations
> >>> put/putall/get/getall etc where we get cache instance by its name (on
> SQL
> >>> level we use the table name) ?
> >>>
> >>
> >> This already works. We can have multiple QueryEntities configured for a
> >> single cache. The only restriction here is that cache key must be unique
> >> across all the tables inside the same schema.
> >>
> >>
> >>>
> >>> 2. What's about ALTER TABLE statement? Will the changes for one table
> >> lock
> >>> all tables in the cache?
> >>>
> >>
> >> It will not be needed, any table structure modification will be
> performed
> >> in the table scope.
> >>
> >>
> >>> 3. In the future if we will introduce table-based features like SQL
> roles
> >>> (as next step of SQL implementation) "many tables one cache" rule looks
> >>> like more difficult for its implementation.
> >>>
> >>
> >> I don't think so, must be the same thing.
> >>
> >>
> >>>
> >>> p.s. may be we need to think about new entity called "table" (based on
> >> type
> >>> descriptor) inside cache to better support of SQL DDL
> >>>
> >>
> >> We already have it: QueryEntity. It has exactly that semantics.
> >>
> >> Sergi
> >>
> >>
> >>>
> >>>
> >>> On Mon, Jan 16, 2017 at 11:06 PM, Sergi Vladykin <
> >> sergi.vladykin@gmail.com
> >>>>
> >>> wrote:
> >>>
> >>>> Dima,
> >>>>
> >>>> I agree that cache==table is definitely a wrong choice, but as far as
> I
> >>> see
> >>>> Vova suggests having cache==tablespace instead of cache==schema. I
> tend
> >>> to
> >>>> agree with this decoupling of physical and logical grouping, but the
> >>>> concern is that it will require much more work to do.
> >>>>
> >>>> Sergi
> >>>>
> >>>> 2017-01-16 21:35 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
> >>>>
> >>>>> Vova,
> >>>>>
> >>>>> Currently I see only 2 ways we can proceed here:
> >>>>>
> >>>>>   1. cache == table
> >>>>>   2. cache == schema
> >>>>>
> >>>>> I agree that "cache==table" may be more flexible, but I don't think
> >> it
> >>>> will
> >>>>> work in Ignite.
> >>>>> We may end up with 1,000s of caches, which will carry significant
> >>>> overhead
> >>>>> on memory and cluster overall. I think that we have no choice but
to
> >>> take
> >>>>> "cache==schema" approach.
> >>>>>
> >>>>> D.
> >>>>>
> >>>>> On Mon, Jan 16, 2017 at 1:00 AM, Vladimir Ozerov <
> >> vozerov@gridgain.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Sergi, Dima,
> >>>>>>
> >>>>>> In the scope of Ignite 1.x it is perfectly fine to have "schema
=
> >>>> cache".
> >>>>>> Nobody suffers from it because nobody use Ignite as database.
But
> >> in
> >>>>>> future, thanks to page memory, we are going to target real database
> >>> use
> >>>>>> cases. Users will have multiple tables in Ignite. Plus views,
> >>> triggers,
> >>>>>> constraints, etc.. All these features are very useful and easy
to
> >>>>> implement
> >>>>>> provided that we already have table and index implementations.
And
> >> in
> >>>>>> databases all related objects are *logically *grouped in a
> >> "schema".
> >>>> This
> >>>>>> is convenient for users: less boilerplate in SQL, better
> >>> manageability
> >>>>>> (remember that database users will definitely need some console
> >>> and/or
> >>>> UI
> >>>>>> tools to manage Ignite as a database).
> >>>>>>
> >>>>>> What you offer is to group database objects *physically *rather
> >> than
> >>>>>> logically. It will lead to:
> >>>>>> - Boilerplate in queries
> >>>>>> - Inconvenient database management. All the things database
users
> >> are
> >>>>> used
> >>>>>> to - import/export tools, UIs, "USING" keyword, etc, will look
> >> weird
> >>> in
> >>>>>> Ignite as there will be no way to group arbitrary objects
> >> logically.
> >>>>>>
> >>>>>> With this approach almost every user will have to use two schemes
> >>>> instead
> >>>>>> of one - one for operational data (PARTITIONED) and one for
> >> reference
> >>>>> data
> >>>>>> (REPLICATED). No conventional database works this way.
> >>>>>>
> >>>>>> Vladimir.
> >>>>>>
> >>>>>> On Fri, Jan 13, 2017 at 9:18 PM, Dmitriy Setrakyan <
> >>>>> dsetrakyan@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Vova,
> >>>>>>>
> >>>>>>> I will join Sergi here. It seems like "schema = cache" will
take
> >>> care
> >>>>> of
> >>>>>>> all different configuration properties required for different
> >>> groups
> >>>> of
> >>>>>>> caches. In addition, it cleanly maps into current Ignite
> >>>> architecture.
> >>>>> We
> >>>>>>> will need to have a very strong reason to move away from
it.
> >>>>>>>
> >>>>>>> D.
> >>>>>>>
> >>>>>>> On Fri, Jan 13, 2017 at 2:39 AM, Vladimir Ozerov <
> >>>> vozerov@gridgain.com
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Correct, it worked, because Ignite has never had real
database
> >>> use
> >>>>> case
> >>>>>>> in
> >>>>>>>> mind. Unfortunately, if our global plans go as expected,
it
> >> will
> >>>> not
> >>>>>> work
> >>>>>>>> for Ignite 2.x+.
> >>>>>>>>
> >>>>>>>> On Fri, Jan 13, 2017 at 11:53 AM, Sergi Vladykin <
> >>>>>>> sergi.vladykin@gmail.com
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Lets move on with SQL schema == Ignite cache. It
worked
> >> always
> >>>> like
> >>>>>>>> this, I
> >>>>>>>>> see no reasons to change this.
> >>>>>>>>>
> >>>>>>>>> Sergi
> >>>>>>>>>
> >>>>>>>>> 2017-01-13 11:20 GMT+03:00 Vladimir Ozerov <
> >>> vozerov@gridgain.com
> >>>>> :
> >>>>>>>>>
> >>>>>>>>>> "Tablespace" (Oracle, PostgreSQL) is what maps
better than
> >>>>> "schema"
> >>>>>>> to
> >>>>>>>>> our
> >>>>>>>>>> cache. But not ideally still.
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jan 13, 2017 at 11:10 AM, Vladimir Ozerov
<
> >>>>>>>> vozerov@gridgain.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Alex,
> >>>>>>>>>>>
> >>>>>>>>>>> Currently Ignite is not used as database.
It is used as
> >>>> search
> >>>>>>>> engine -
> >>>>>>>>>>> several types, several tables, several joins.
This is why
> >>>>> having
> >>>>>>>>> "SCHEMA
> >>>>>>>>>> ==
> >>>>>>>>>>> cache" was never a problem. Users have never
build
> >> complex
> >>>> SQL
> >>>>>>>>>> applications
> >>>>>>>>>>> on top of Ignite. But we are going towards
database. And
> >> my
> >>>>>>> question
> >>>>>>>>>> stands
> >>>>>>>>>>> still - suppose it is Y2019, how is user
going to migrate
> >>> his
> >>>>>>>> database
> >>>>>>>>>>> containing 20-30-50-100 tables in a single
schema in
> >> Oracle
> >>>> to
> >>>>>>>> Ignite?
> >>>>>>>>>>>
> >>>>>>>>>>> Single cache for all tables? Doens't work
- not flexible.
> >>>> Users
> >>>>>>> will
> >>>>>>>>>>> definitely require different cache modes,
different
> >>>> co-location
> >>>>>>>> rules,
> >>>>>>>>>>> different number of backups, etc..
> >>>>>>>>>>> Schema per table? Doesn't work either -
unmanageable and
> >>> not
> >>>>>>>> convenient
> >>>>>>>>>>> for users even for relatively small databases.
> >>>>>>>>>>>
> >>>>>>>>>>> From user perspective schema is logical
grouping of
> >>> database
> >>>>>>> objects,
> >>>>>>>>>>> nothing more.
> >>>>>>>>>>>
> >>>>>>>>>>> For Ignite schema could be a logical group
of resources
> >>>> (nodes,
> >>>>>>>> memory
> >>>>>>>>>>> pools, caches, etc.). And multiple tables
over multiple
> >>>> caches
> >>>>>>> should
> >>>>>>>>>>> reside in it. To the contrast, table definition
governs
> >> how
> >>>>> data
> >>>>>> is
> >>>>>>>>>> stored.
> >>>>>>>>>>> This is similar to, for example, MySQL approach,
where
> >> you
> >>>>> define
> >>>>>>> how
> >>>>>>>>> you
> >>>>>>>>>>> store data on per-table level, and on schema
level you
> >>> define
> >>>>>> only
> >>>>>>>>> minor
> >>>>>>>>>>> things like collation.
> >>>>>>>>>>>
> >>>>>>>>>>> Vladimir.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jan 13, 2017 at 10:33 AM, Alexander
Paschenko <
> >>>>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Vova,
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2017-01-13 4:56 GMT+08:00 Vladimir Ozerov
<
> >>>>> vozerov@gridgain.com
> >>>>>>> :
> >>>>>>>>>>>>> I am not quite sure I understand
the idea of "SCHEMA
> >> ==
> >>>>>> cache".
> >>>>>>>>>> Consider
> >>>>>>>>>>>>> some small database with, say, ~30
tables. And user
> >>> wants
> >>>> to
> >>>>>>>> migrate
> >>>>>>>>>> to
> >>>>>>>>>>>>> Ignite. How is he supposed to do
so? 30 schemas
> >> leading
> >>> to
> >>>>>>> rewrite
> >>>>>>>>> of
> >>>>>>>>>>>> all
> >>>>>>>>>>>>> his SQL scripts? Or 30 key-value
pairs in a single
> >> cache
> >>>>>> leading
> >>>>>>>> to
> >>>>>>>>>>>> lack of
> >>>>>>>>>>>>> flexibility and performance problems?
> >>>>>>>>>>>>
> >>>>>>>>>>>> But currently schema *is* semantically
equal to cache
> >>> while
> >>>>>> table
> >>>>>>> is
> >>>>>>>>>>>> equal to type descriptor (i.e. type
of stored entities),
> >>>>> nothing
> >>>>>>> new
> >>>>>>>>>>>> here.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Say, in single cache we may have entities
of types
> >> Person
> >>>> and
> >>>>>>>>>>>> Organization, those map to two tables
with same names,
> >> and
> >>>> can
> >>>>>> be
> >>>>>>>>>>>> accessed within the same cache (i.e.
schema).
> >>>>>>>>>>>>
> >>>>>>>>>>>> If we want to limit the user with having
single type
> >>>>> descriptor
> >>>>>>> per
> >>>>>>>>>>>> cache (i.e. cache has only one type
of stored entities -
> >>>> BTW,
> >>>>>>> where
> >>>>>>>> we
> >>>>>>>>>>>> are with this 2.0-wise?), then this
notion could change.
> >>> But
> >>>>>>>> currently
> >>>>>>>>>>>> what has been suggested already fits
quite good with
> >> what
> >>> we
> >>>>> do
> >>>>>>> have
> >>>>>>>>>>>> at the moment regarding semantic of
SQL objects.
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Alex
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Another example is how to deal with
referene tables?
> >>> Lots
> >>>>>>> database
> >>>>>>>>> has
> >>>>>>>>>>>>> small reference tables which is
best to fit REPLICATED
> >>>>> cache,
> >>>>>>>> while
> >>>>>>>>>>>> others
> >>>>>>>>>>>>> are usually bound to PARTITIONED
mode. "SCHEMA ==
> >> cache"
> >>>>> will
> >>>>>>>> force
> >>>>>>>>>>>> users
> >>>>>>>>>>>>> to split them into separate schemes
leading to poor
> >> user
> >>>>>>>> experience.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I understand that we may have some
implementation
> >>> details
> >>>>>> around
> >>>>>>>> it
> >>>>>>>>> at
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> moment. But from user perspective
"SCHEMA == cache"
> >>>> doesn't
> >>>>>> make
> >>>>>>>>>> sense.
> >>>>>>>>>>>> As
> >>>>>>>>>>>>> we are going towards AI 2.0 we'd
better to rethink
> >> this
> >>>>>>> approach.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Jan 12, 2017 at 11:46 PM,
Denis Magda <
> >>>>>>> dmagda@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Jan 12, 2017, at 12:35
PM, Dmitriy Setrakyan <
> >>>>>>>>>>>> dsetrakyan@apache.org>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at
9:47 AM, Sergi Vladykin <
> >>>>>>>>>>>>>> sergi.vladykin@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The xml config was only
for example. We can put in
> >>>> this
> >>>>>>>>>>>> configuration
> >>>>>>>>>>>>>>>> string cache config
parameters directly like this:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> CREATE SCHEMA "MyCacheName"
WITH
> >>>>>>>>>>>>>>>> "cacheMode=REPLICATED;atomicityMode=ATOMIC"
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This approach makes sense,
if it can be easily
> >>>> supported
> >>>>>> with
> >>>>>>>> H2.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What’s for affinity keys?
Can we make an exception
> >> for
> >>>> them
> >>>>>> by
> >>>>>>>>>>>> defining in
> >>>>>>>>>>>>>> this part of the statement
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> CREATE TABLE employee (
> >>>>>>>>>>>>>>   id BIGINT PRIMARY KEY,
> >>>>>>>>>>>>>>   dept_id BIGINT AFFINITY KEY,
> >>>>>>>>>>>>>>   name VARCHAR(128),
> >>>>>>>>>>>>>> );
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> or that l
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> CREATE TABLE employee (
> >>>>>>>>>>>>>>   id BIGINT PRIMARY KEY,
> >>>>>>>>>>>>>>   dept_id BIGINT,
> >>>>>>>>>>>>>>   name VARCHAR(128),
> >>>>>>>>>>>>>>   CONSTRAINT affKey AFFINITY
KEY(dept_id)
> >>>>>>>>>>>>>> );
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> —
> >>>>>>>>>>>>>> Denis
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sergey Kozlov
> >>> GridGain Systems
> >>> www.gridgain.com
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message