geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan W.McAdams <>
Subject Re: Automatic Key Generation / Getting Involved
Date Wed, 31 Dec 2003 14:44:16 GMT
I've Cross posted to OpenEJB list as it sounds like this discussion  
should be moved there... The question being support for DB Level  
automatic key generation (mysql autoincrement, sybase/ms-sql identity,  
postgresql serial, etc) in the CMP system - will it be supported and  
how can I get involved in helping with this process...

The original question is at:

On Dec 30, 2003, at 9:34 PM, Jeremy Boynes wrote:

> This is really part of OpenEJB Nova so you might want to take it to  
> that  list but it is probably of interest  
> here as well :)

OK; I wasn't aware that OpenEJB was being used for the container system  
- very cool.  So I should get involved with the OpenEJB list for  
development of Nova? I take it the CVS for that resides @ OpenEJB also  
(I'm going through the process of getting my home dev box redone and  
installing maven now)?

I think it's clear I should spend more time playing and reading before  
I post any further boneheaded questions ... but I'll do it anyway:

> Support for unknown-pk is crap right now - I knocked something  
> together for ApacheCon that just used a random number but that is not  
> a solution.

Thinking about what is done and how, I'm inclined to agree now on this  
point.  Here's the big question:

Assuming the development efforts were willing to go in the direction of  
supporting this functionality, taking a general poll - is it better  
when dealing with things like supporting DB Generated keys to have a  
'generic methodology' which is by definition less flexible (like what  
JBoss does), or a database specific support  structure that relies upon  
coding a driver for each database type (Like BEA Weblogic does)?  As  
much as it annoys me, my leaning is towards DB Specific support, with  
an optional 'generic' for unsupported DBs.  Using each databases  
functionality for a feature like this gives you the most flexibility,  
and I can speak from experience that DBAs really consider it A Good  
Thing (tm) when you use DB specific features, and allow the database to  
manage key generation Angry Marauding DBAs == bad].

> There are a few basic techniques that need to be covered:
> 1) pre-insert generation. This might just be other implementations
>    of o.o.n.entity.cmp.PrimaryKeyFactory that generate a PK somehow,
>    inject it into the InstanceData and return the value. Examples
>    could be a caching hi-lo key generator or something that issued
>    a JDBC operation to get the value

I think if we go along the lines of database specific drivers, this is  
not the way to go.  As a general rule, Geronimo/Nova really is going to  
*need* to allow the database to generate it's own keys for the most  
flexibility.  Our DBAs here have pushed pretty hard at us to begin  
using database generated keys - previously (Before figuring out how to  
get Weblogic to use Sybase like MS-SQL Server in the case of Identity  
columns) we were calling a standard sequence generation stored  
procedure in ejbCreate and doing a setID()  - this worked well but  
reduces the DBAs flexibility to adjust ID memory caching and key  
sequentiality.  It drives them batty to have us generating our own keys  
- I can't imagine they are the only DBAs in the worldwho are willing to  
fight  against CMP for just this reason (On that note - the ability to  
audit the database calls the CMPs are generating would be a good  

The above option may be good for a generic option on unsupported RDBMS'  
- I know the aforementioned Weblogic 'generic' method uses a key cache  
where it can pre-select a set number (default 100) of keys from the  
sequence table so that it doesn't have to always go back to the  
database each time it generates a key.  This is of course facilitated  
by it having an exclusive hold on the sequence table.
> 2) insert-side effect, where the value is generated as a consequence
>    of performing the insert operation. Examples would be a MySQL auto-
>    increment, JDBC getGeneratedKeys(), or Oracle with INSERT-RETURNING.
>    The simple solution (KISS) is to always insert between ejbCreate and
>    ejbPostCreate but that can mean some transactions insert then
>    immediately update; the more complex solution would be to defer the
>    insert for as long as possible, but that means being able to cope
>    and reference to an EJB whose identity is not known. A compromise
>    would be to delay until after ejbPostCreate which at least gives the
>    user the chance to initialize any CMRs. This should be selectable by
>    simply defining alternative VOPs for the create.

This is absolutely, IMNSHO the best route to go.  Most modern RDBMS'  
have their own on-insert fill-in-the-primary-key-column generation  
method, which, as mentioned above, DBAs prefer we use for various  
reasons.  In addition to making our DBAs happy, using the DB specific  
functionality makes our lives a LOT easier, by allowing the DB to do  
what it does best - manage the data we are trusting it to store.  The  
only question for how to support Database specific functionality for  
this would be I think, how do we do it?  I see two primary methods:

1) Something similar to JBoss' "unknown-pk" method - essentially,  
declare that the primary key will be database generated, and that it is  
a field of type 'X'.  There then could be a section in the  
geronimo-cmp-rdbms.xml that declares a block of SQL or functions that  
must be called post-insert to find out the generated key.  I'm not sure  
however just how 'optimal' this could be.

2) The Weblogic route.  Declare in the geronimo-cmp-rdbms.xml file that  
the primary key is auto-generated on insert, and give specifics on what  
the database is.  This would require of course that we have drivers for  
the major RDBMs, and document how to setup and interface their key  
generation methods within the CMP descriptors.  Albeit less generic, I  
think this is the best route personally - I'm not sure I have enough  
experience to make that declaration however.

> I'd start by having a look at the create VOPs and feel free to bounce  
> any questions off the list.
Mind pointing me in the right direction on this?  I am fairly new to  
the game - not 100% where to start, although as aforementioned more  
than willing to jump in head first...  If we can get a good feel for  
the preferred generation methodologies, I can start in on writing code  
- I have mysql, sybase 12.5.1 and postgresql setup at home that I can  
test/code against.

Anyone got any other thoughts on the subject? 

View raw message