incubator-bloodhound-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jure Zitnik <j...@digiverse.si>
Subject Re: [Apache Bloodhound] #404: Populate default schema on product addition
Date Wed, 27 Feb 2013 13:59:39 GMT
On 2/26/13 7:08 PM, Olemis Lang wrote:
> On 2/26/13, Jure Zitnik <jure@digiverse.si> wrote:
>> On 2/26/13 10:01 AM, Olemis Lang wrote:
>>
> [....]
>> I started upgrade scenario discussion in a separate thread last week,
>> subject '[BEP-0003] Custom (3rd party plugin) table upgrade to
>> multi-product'.
>>
> yes , I know ... but at this point I guess we better follow up in here
> . Indeed my last msg was a rewritten version of the reply I was about
> to post in there . If you find some inconsistencies and typos in there
> now you know what is the cause.
>
ok :)
>> jftr, I understand reasoning for trying to find out another way and not
>> go with the O(p) plugin tables. Please note that the current approach
>> has been proposed in BEP-0003 a couple of months ago and that all work
>> done till now was following that design.
> Yes , I get it . I'm just trying to push a little because O(p) order
> of magnitude is not encouraging ... so any effort we make towards
> getting rid of it will be positive afaics

I agree, we need to discuss this for the reasons known. And I agree that 
if we find another way it'll definitely have positive affect.
>> The option of adding 'product'
>> column to 3rd party plugin tables (the same way as for system tables)
>> was assessed when preparing the BEP-0003 database translation proposal.
>> It was dropped for the reasons stated below.
>>
> Well , I know it was discussed but it was not until now that I figured
> out a way that *might* work and was not discussed before . Given the
> importance of the subject IMO we better take a look at it now since
> the beginning .

ok

>> Again, IMO that is not doable mainly because we don't know
>> (semantically) what's in those tables.
> Maybe we don't need to do so ;)
> We only need to know that all plugin data will be repeated per-product
> . Adding an extra product column extending
>
>> We don't know the schema,
> If we could reverse engineer the DB schema at run time (e.g. build the
> very same trac.db.Table schema classes and alike) then this is my
> reasoning

Heh, that sounds a bit like a 'run-time SQL to ORM mapper' to me.

> Let's consider a DB D(T, R, I) like this
>
> T(k, u, c) where k = key columns , u = unique columns , c = all other columns
> R(T1, k1, T2, k2) represents relationships between two such tables
> I(T, i) is an index defined on table T for a set of columns .
> G(T) will be the set of tables holding global resources after migration
> P(T) will be the set of tables holding product-specific resources
> after migration
>
> Considering this , non MP-aware plugin tables will always belong in P
>
> I'm proposing to migrate them by applying a transformation U : D(R, T,
> I) -> D'(R', T', I') like this
>
> T'(k + ('product',) , u, c) for all T in P
> T'(k, u, c) for all T in G
> R'(T1, k1 + ('product',) , T2, k2 + ('product',)) for all T1 , T2 both in P
> R'(T1, k1, T2, k2) otherwise.
> I'(T', i + ('product',)) for I(T, i) and T in P
> I'(T', i) for I(T, i) and T in G

In my understanding, this is exactly what we have been doing with the 
system tables. So basically the proposition you're making is to handle 
custom (3rd party non multi-product aware plugin) tables in a very 
(except for proposed DDL changes below) similar, if not exactly the same 
way, as we do the system tables. Ok, with the global/product tables 
combo if it turns out they're at all required.

> The trick about relationships is already available in our SQL
> translator thanks to the ( SELECT * FROM T where product = 'prefix' )
> hack . So e.g. when executing joins all tables will be at the same
> level i.e. ON T1.product = T2.product constraint will implicit because
> aforementioned statements imply T1.product = T2.product = 'prefix' .
> OTOH relationships involving global resources will not scope those
> tables using product prefix . So table relationships up to this point,
> will be consistent in both cases.

Agree with this part.

> [...]
>>> I'll only care about DB stuff because that's actually what really
>>> matters and the cause of this bottleneck .
>> I'd say that the DB stuff matters, but we can't just ignore other issues.
>>
> yes , I'm trying to focus on this by abstracting all other details .
> E.g. creating files is important but has no relation when it comes to
> DB schema . That's what I mean

ok, let's put other details off the table for now.

>>> [...]
>> How would we be handling indexes, unique keys and other possible
>> constraint, etc. in any of the above cases? Automatically adding
>> 'product' column to unique constraints? I don't believe there's a
>> general rule that we could apply in regards to (but not limited to)
>> constraints...
>>
> Why not ? Think of it this way . You have identical DB schema in two
> separate environments E1, E2 before MP upgrade , let's say Trac
> environments . You'll have index I(T, i) defined on both envs . Now
> abstract the limitations imposed by the underlying Trac architecture ,
> the DB , etc ... and think of this scenario as the *data we have in
> this universe of discurse* so if you want add a dummy product prefix
> E1 to all data in E1 and another to all data in E2 . That will be the
> only artifact you'll have to diferentiate both sets of data when u
> erase the Trac env , the DB and everything else in your mind . In the
> end relational models are plain numbers . Everything else is a
> distraction .
>
>  From a relational perspective what do you have installed in place ?
> You'll have a single concept related to table T working exactly like a
> compound index i + ('product',) . The same will hold for constraints
> and so on . So the only thing we'd need to do is to implement the
> relational equivalent we'd have in such scenario but inside the DB .
> The 'product' column does not add anything new into the scene ;
> neither semantics , nor new constraints , nor anything else (unless
> I'm missing something obvious that I'm hoping you'll be pointing out
> soon ;)
>
> Maybe there are some minor inconsistencies in the definition of the
> upgrade transformation I mentioned above , but in general the idea is
> : modify MP DB schema so that product-specific data data will be
> modeled exactly the same as if everything was in two different
> isolated environments ; 'product' column will not make a particularly
> important difference other than merely multiply resources.
>
> The fact that under certain circumstances plugins should be able to
> leverage resources to be global is out of scope in this discussion .
> We cannot offer anything better than what plugin authors might offer
> unless they take advantage of our new interfaces and architecture .
> However afaics we can really represent plugin data exactly the same as
> if they were on two isolated environments by using a single table .

I understand the idea, that's what's been implemented for the system 
tables. But there is a huge difference between system and custom tables. 
In the first case, we have complete control over things (schema, 
install, upgrade and such). In the case of plugins and custom table, 
unfortunately, we don't have any idea of what the plugin might do (or 
not do) during install/upgrade process. We can't assume it's gonna be 
CREATE/SELECT/INSERT/SELECT sequence you described below. It might 
actually be *anything*. Including DROPs for example. More below...

> [...]
>
>> I'm strongly against recording SQLs
>> and doing replays.
>>
> I'm not saying we should record any SQL and repeat it later . Please
> read below .

Ok, the 'SQL replay' was misunderstanding then.

>> For example, how would MP system (or whoever we choose to do the
>> replays) replay the following really simple SQL, per-product:
>> SELECT * FROM component WHERE name='Product A component';
>> assuming that 'Product A component' is only defined in product 'A' ...
>>
> Good challenge . I take it as this will be inside one of the upgrade
> methods of a non-MP aware plugin and therefore all the SQL is generic
> i.e. not translated .
> Everything I'll mention will happen in the global upgrade loop just
> once on create new env and env upgrade, and later many times on
> product creation
>
> Let's make it a bit more complicated to cover the whole thing . Let
> the process be .
>
> CREATE TABLE new_table [...]
> SELECT * FROM component WHERE name='Product A component'
> INSERT x,y INTO new_table VALUES [...]
> SELECT whatever FROM anywhere WHERE anything [...]
> UPDATE existing_table VALUES [...] WHERE [...]
Another challenge, let's say the upgrade (for example) sequence of non 
multi-product plugin goes like this:

CREATE TABLE new_temp_table [...., newcol] CONSTRAINT(...+newcol)
INSERT INTO new_temp_table (c1, c2, c3, newcol) SELECT c1, c2, c3, 
'whatever' FROM existing_custom_table
DROP TABLE existing_custom_table
CREATE TABLE existing_custom_table [...., newcol] CONSTRAINT(...+newcol)
INSERT INTO existing_custom_table (col1, col2, col3, newcol) SELECT * 
FROM new_temp_table
DROP TABLE new_temp_table

Now, as afaicr sqlite doesn't support table index or constraint changes 
on existing tables, so this would be a 'normal' sequence to add a new 
column to a custom table and include it in the constraint (or index for 
that matter).

> Insert will reuse the values retrieved in SELECT statement
>
> So I'm basically saying this is what should happen :
>
> 1. Global plugin upgrade method is invoked , hence component instantiated in
>     global env scope
> 2. CREATE TABLE and DDL statements will be transformed to prepare the
> table to store
>     data for different products by adding 'product' column and everything else
>     required to make everything work like if we had two isolated environments
>     with the same schema (which you already know as it's in the body of SQL DDL
>     statement) . The SQL translator is responsible for making this work
> this way .
> 3. SELECT statement will be scoped using global env prefix (i.e. product='') so
>     component in global env will be looked up as expected .
> 4. INSERT statement will be scoped using global env prefix (i.e. product='') so
>     product prefix will be appended as expected .
> 5. Any other SELECT statement will be scoped using global env prefix
>     (i.e. product='') as expected .
> 6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
>     product prefix will be appended as expected .
>
> No SQL recorded .

Steps 3-6 are exactly what we're doing now for system tables.

I don't think step 2 is doable unless we find a way on how it should 
work on the sequence above.

> Afterwards while still in global upgrade loop, this so called entity
> we'll have in any case will detect that components C1, C2 , C3 were
> upgraded in global scope (i.e. needs_upgrade and similar stuff
> returned True) . Therefore the *so called entity* may be either
> EnvironmentSetup itself , or a component registered as a setup
> participant (e.g. MP system) or ... well ... anything else we might
> invent running inside the global upgrade loop, though we should need
> such thing . C1 , C2 , C3 will be all non MP-aware components . So it
> will instantiate product envs inside a loop and replay *the upgrade
> procedure* once again something like this .
>
> for p in ALL_PRODUCTS:
>      product_env = ProductEnvironment(env, p)
>
>      for cls in [C1, C2, C3]:
>          c = C1(product_env)
>          c.upgrade_proc(...)

A couple of days ago, in either this or another thread, I proposed to 
invoke the component upgrade within a product environment as this would 
enable the translator to properly kick in and translate component 
tables. It'd also 'automagically' solve some of the dirty details that 
we put off the table above.

> We might take more precautions in here, like using DB transactions ,
> etc ... it's just a simplified version I scketch so that you get my
> point .
> Notice a couple of things :
>
>    - DB conections used by upgrade proc inside the loop
>      will be scoped in product context that will make a difference

I assume that the 'scoped in product context' actually means that the 
upgrade proc will be executed from within the product environment/context.

>    - upgrade_proc is the very same method we mentioned above .
>    - Setup participants in
ok (something missing in the last sentence though)

> So to the point `upgrade_proc` will do things as follows .
>
> 1. Component upgrade method is invoked now in product scope, hence
>     instantiated in product env scope
+1, this is what I'm advocating from the beginning of this discussion
> 2. CREATE TABLE and DDL statements will not be executed because as a
>     precondition the global setup already took care of the necessary
>     precautions and schema will be ready . The SQL translator together
>     with our product DB context managers are responsible for making
>     this work this way .
This is a problem, see the sequence above, we don't have any way of 
knowing what DDLs and for what tables should be turned off and for which 
not.

> 3. SELECT statement will be scoped using product env prefix (i.e.
> product='MYPRODUCT') so
>     component in global env will be looked up as expected.
> 4. INSERT statement will be scoped using product env prefix (i.e.
> product='MYPRODUCT') so
>     product prefix will be appended as expected .
> 5. Any other SELECT statement will be scoped using product prefix
>     (i.e. product='MYPRODUCT') as expected so queries will still
>     operate on the DB row sub-space belonging to
>     the target product . Full look up will work for global tables as
>     translation happens on per-table basis.
> 6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
>     product prefix will be appended as expected . No product prefix for
>     global tables since they won't be translated at all.
Correction, step 6 should also be scoped using product prefix.

This is what we've been doing for system tables (in product scope), as 
mentioned above.

Cheers,
Jure



Mime
View raw message