incubator-bloodhound-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olemis Lang <ole...@gmail.com>
Subject Re: [Apache Bloodhound] #404: Populate default schema on product addition
Date Tue, 26 Feb 2013 18:08:38 GMT
On 2/26/13, Jure Zitnik <jure@digiverse.si> wrote:
> On 2/26/13 10:01 AM, Olemis Lang wrote:
>> On 2/25/13, Jure Zitnik <jure@digiverse.si> wrote:
>>> On 2/24/13 7:07 AM, Olemis Lang wrote:
>>>> On 2/22/13, Jure Zitnik <jure@digiverse.si> wrote:
[....]
>>
>>  ... but your initial scenarios
>> are just find to get this sorted out .
>
> I started upgrade scenario discussion in a separate thread last week,
> subject '[BEP-0003] Custom (3rd party plugin) table upgrade to
> multi-product'.
>

yes , I know ... but at this point I guess we better follow up in here
. Indeed my last msg was a rewritten version of the reply I was about
to post in there . If you find some inconsistencies and typos in there
now you know what is the cause.

>>> I'll
>>> describe scenarios for both cases, first for non multi-product aware
>>> component and then for multi-product aware component.
>>>
>> I'll follow this sequence with the purpose in mind of finding a way to
>> work around the current design limitation of having O(p) plugin tables
>> .
>>
>> So I'm basically thinking of having one of the following DB schemas :
>>
>>    1. A single table scoping all plugin resources at product level
>>    2. Two tables : one unchanged for products in global scope ,
>>        the second one for product resources .
>
> jftr, I understand reasoning for trying to find out another way and not
> go with the O(p) plugin tables. Please note that the current approach
> has been proposed in BEP-0003 a couple of months ago and that all work
> done till now was following that design.

Yes , I get it . I'm just trying to push a little because O(p) order
of magnitude is not encouraging ... so any effort we make towards
getting rid of it will be positive afaics

> The option of adding 'product'
> column to 3rd party plugin tables (the same way as for system tables)
> was assessed when preparing the BEP-0003 database translation proposal.
> It was dropped for the reasons stated below.
>

Well , I know it was discussed but it was not until now that I figured
out a way that *might* work and was not discussed before . Given the
importance of the subject IMO we better take a look at it now since
the beginning .

> The proposed 2 options for the database schema are not fundamentally
> different.

+1

> We could go with single table by adding the 'product' column
> and scoping the table the same way as we do with the system resource
> tables. I believe that's (1) in your list. In both scenarios we run into
> the same obstacles as explained below.
>

ok

> Again, IMO that is not doable mainly because we don't know
> (semantically) what's in those tables.

Maybe we don't need to do so ;)
We only need to know that all plugin data will be repeated per-product
. Adding an extra product column extending

> We don't know the schema,

If we could reverse engineer the DB schema at run time (e.g. build the
very same trac.db.Table schema classes and alike) then this is my
reasoning

Let's consider a DB D(T, R, I) like this

T(k, u, c) where k = key columns , u = unique columns , c = all other columns
R(T1, k1, T2, k2) represents relationships between two such tables
I(T, i) is an index defined on table T for a set of columns .
G(T) will be the set of tables holding global resources after migration
P(T) will be the set of tables holding product-specific resources
after migration

Considering this , non MP-aware plugin tables will always belong in P

I'm proposing to migrate them by applying a transformation U : D(R, T,
I) -> D'(R', T', I') like this

T'(k + ('product',) , u, c) for all T in P
T'(k, u, c) for all T in G
R'(T1, k1 + ('product',) , T2, k2 + ('product',)) for all T1 , T2 both in P
R'(T1, k1, T2, k2) otherwise.
I'(T', i + ('product',)) for I(T, i) and T in P
I'(T', i) for I(T, i) and T in G

The trick about relationships is already available in our SQL
translator thanks to the ( SELECT * FROM T where product = 'prefix' )
hack . So e.g. when executing joins all tables will be at the same
level i.e. ON T1.product = T2.product constraint will implicit because
aforementioned statements imply T1.product = T2.product = 'prefix' .
OTOH relationships involving global resources will not scope those
tables using product prefix . So table relationships up to this point,
will be consistent in both cases.

> we
> don't know what data it could aggregate and/or reference,

we don't know all the details that's a fact , but we know that product
addition will not change the nature of the relationships present in
former non-MP setup

> we don't know
> what changes to the schema and data could be performed during the
> upgrade etc.
>

Product column has no semantics and should not affect any of these.
Think of it this way : we just put the whole resource container in an
array .

>
>> I'll only care about DB stuff because that's actually what really
>> matters and the cause of this bottleneck .
>
> I'd say that the DB stuff matters, but we can't just ignore other issues.
>

yes , I'm trying to focus on this by abstracting all other details .
E.g. creating files is important but has no relation when it comes to
DB schema . That's what I mean
;)

>>> 1. Global environment creation
>>> -- Non multi-product aware component
>>> In this case, the components
>>> 'IEnvironmetSetupParticipant.environment_created' should be executed in
>>> global environment only (as there are no products). Result of this would
>>> be 'my_table' table with aggregated/referenced data from 'null' product
>>> scope (global scope). Note that the translator does not prefix table
>>> names when product is 'null'. File that the component creates would be
>>> stored in the global environment path, the 'system' table would get a
>>> record with 'null' product prefix.
>>>
>>> [...]
>> Looking at this with the previous goals in mind , IMO at this point we
>> should be doing this (numbers will match scenarios listed above)
>>
>>    1. create plugin table with extra 'product' column for prefix
>>        and perform all other outstanding DDL and DML statements
>>        based on this modified schema .
>>    2. create plugin table with extra 'product' prefix column
>
> How would we be handling indexes, unique keys and other possible
> constraint, etc. in any of the above cases? Automatically adding
> 'product' column to unique constraints? I don't believe there's a
> general rule that we could apply in regards to (but not limited to)
> constraints...
>

Why not ? Think of it this way . You have identical DB schema in two
separate environments E1, E2 before MP upgrade , let's say Trac
environments . You'll have index I(T, i) defined on both envs . Now
abstract the limitations imposed by the underlying Trac architecture ,
the DB , etc ... and think of this scenario as the *data we have in
this universe of discurse* so if you want add a dummy product prefix
E1 to all data in E1 and another to all data in E2 . That will be the
only artifact you'll have to diferentiate both sets of data when u
erase the Trac env , the DB and everything else in your mind . In the
end relational models are plain numbers . Everything else is a
distraction .

>From a relational perspective what do you have installed in place ?
You'll have a single concept related to table T working exactly like a
compound index i + ('product',) . The same will hold for constraints
and so on . So the only thing we'd need to do is to implement the
relational equivalent we'd have in such scenario but inside the DB .
The 'product' column does not add anything new into the scene ;
neither semantics , nor new constraints , nor anything else (unless
I'm missing something obvious that I'm hoping you'll be pointing out
soon ;)

Maybe there are some minor inconsistencies in the definition of the
upgrade transformation I mentioned above , but in general the idea is
: modify MP DB schema so that product-specific data data will be
modeled exactly the same as if everything was in two different
isolated environments ; 'product' column will not make a particularly
important difference other than merely multiply resources.

The fact that under certain circumstances plugins should be able to
leverage resources to be global is out of scope in this discussion .
We cannot offer anything better than what plugin authors might offer
unless they take advantage of our new interfaces and architecture .
However afaics we can really represent plugin data exactly the same as
if they were on two isolated environments by using a single table .

[...]
>>> from within
>>> 'ProductEnvironment('MYPRODUCT')'.
>> Nope . I disagree here . IMO , this should be happeing immediately
>> after creating the product either in MP system's CRUD handler , a
>> dedicated implementation of a product resource listener or a
>> combination of both .
>>
> I understand this, though I don't agree for the reasons I stated above.
> In my opinion, and I think Andrej also mentioned this in his reply, from
> our perspective what happens within component's
> 'IEnvironmentSetupParticipant' method is a black box. We can do the SQL
> translation through our proxy but I'm strongly against recording SQLs
> and doing replays.
>

I'm not saying we should record any SQL and repeat it later . Please
read below .

> For example, how would MP system (or whoever we choose to do the
> replays) replay the following really simple SQL, per-product:
> SELECT * FROM component WHERE name='Product A component';
> assuming that 'Product A component' is only defined in product 'A' ...
>

Good challenge . I take it as this will be inside one of the upgrade
methods of a non-MP aware plugin and therefore all the SQL is generic
i.e. not translated .
Everything I'll mention will happen in the global upgrade loop just
once on create new env and env upgrade, and later many times on
product creation

Let's make it a bit more complicated to cover the whole thing . Let
the process be .

CREATE TABLE new_table [...]
SELECT * FROM component WHERE name='Product A component'
INSERT x,y INTO new_table VALUES [...]
SELECT whatever FROM anywhere WHERE anything [...]
UPDATE existing_table VALUES [...] WHERE [...]

Insert will reuse the values retrieved in SELECT statement

So I'm basically saying this is what should happen :

1. Global plugin upgrade method is invoked , hence component instantiated in
   global env scope
2. CREATE TABLE and DDL statements will be transformed to prepare the
table to store
   data for different products by adding 'product' column and everything else
   required to make everything work like if we had two isolated environments
   with the same schema (which you already know as it's in the body of SQL DDL
   statement) . The SQL translator is responsible for making this work
this way .
3. SELECT statement will be scoped using global env prefix (i.e. product='') so
   component in global env will be looked up as expected .
4. INSERT statement will be scoped using global env prefix (i.e. product='') so
   product prefix will be appended as expected .
5. Any other SELECT statement will be scoped using global env prefix
   (i.e. product='') as expected .
6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
   product prefix will be appended as expected .

No SQL recorded .

Afterwards while still in global upgrade loop, this so called entity
we'll have in any case will detect that components C1, C2 , C3 were
upgraded in global scope (i.e. needs_upgrade and similar stuff
returned True) . Therefore the *so called entity* may be either
EnvironmentSetup itself , or a component registered as a setup
participant (e.g. MP system) or ... well ... anything else we might
invent running inside the global upgrade loop, though we should need
such thing . C1 , C2 , C3 will be all non MP-aware components . So it
will instantiate product envs inside a loop and replay *the upgrade
procedure* once again something like this .

for p in ALL_PRODUCTS:
    product_env = ProductEnvironment(env, p)

    for cls in [C1, C2, C3]:
        c = C1(product_env)
        c.upgrade_proc(...)

We might take more precautions in here, like using DB transactions ,
etc ... it's just a simplified version I scketch so that you get my
point .
Notice a couple of things :

  - DB conections used by upgrade proc inside the loop
    will be scoped in product context that will make a difference
  - upgrade_proc is the very same method we mentioned above .
  - Setup participants in

So to the point `upgrade_proc` will do things as follows .

1. Component upgrade method is invoked now in product scope, hence
   instantiated in product env scope
2. CREATE TABLE and DDL statements will not be executed because as a
   precondition the global setup already took care of the necessary
   precautions and schema will be ready . The SQL translator together
   with our product DB context managers are responsible for making
   this work this way .
3. SELECT statement will be scoped using product env prefix (i.e.
product='MYPRODUCT') so
   component in global env will be looked up as expected.
4. INSERT statement will be scoped using product env prefix (i.e.
product='MYPRODUCT') so
   product prefix will be appended as expected .
5. Any other SELECT statement will be scoped using product prefix
   (i.e. product='MYPRODUCT') as expected so queries will still
   operate on the DB row sub-space belonging to
   the target product . Full look up will work for global tables as
   translation happens on per-table basis.
6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
   product prefix will be appended as expected . No product prefix for
   global tables since they won't be translated at all.

[...]
>
> I don't understand what you meant with 'shut off DDL statements'?

Typo : turn off

> If
> nothing else, all DDLs would need to be translated in such a way to take
> into account the 'product' column.
>

see above

[...]

-- 
Regards,

Olemis.

Mime
View raw message