db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 22 Mar 2007 16:42:18 GMT


Mamta Satoor wrote:
> Actually, let me start by asking a store question. Is store going to 
> write the collation type column metadata only if the user has requested 
> TERRITORY_BASED collation ie can 2 newly created 10.3 databases have 
> different column metadata structure(ie with and without collation type 
> info) depending on whether the user has requested TERRITORY_BASED or 
> UCS_BASIC collation?
What I was leaning toward was that any 10.3 version db would write out
the same metadata, I am not totally sure of the format of that metadata 
yet, but logically it can be used to determine the collate id of every
column in every conglomerate - both base tables and indexes.

Databases at a version prior to 10.3 would continue to write out the
same metadata as today.  These databases stay at their version unless
they undergo a hard 10.3 upgrade.

When the hard upgrade happens it is not necessary to upgrade the 
metadata of every conglomerate at the store level.  This metadata all
has a version tag.  Not sure what the number is right now, but let's say
it is 10.0.  In 10.3 the default code will read and write 10.3 version
metadata.  There will also be some code that can read 10.0 version 
metadata, and the easiest thing to do is to read it into a 10.3 version
class so that the rest of the code need not know the difference.  The
pre-10.3 soft upgrade db's should continue to write out 10.0 version
metadata.
>  
> Mamta
>  
> On 3/21/07, *Mike Matrigali* <mikem_app@sbcglobal.net 
> <mailto:mikem_app@sbcglobal.net>> wrote:
> 
> 
> 
>     Mamta Satoor wrote:
>      > 2)At the time of upgrade of pre-10.3 database, we should make
>     sure that
>      > derby.database.collation property with value UCS_BASIC in added to
>      > services.properties. This is because we do not plan on supporting
>      > collation change for existing databases.
>     Is this required?  How does the code handle a soft upgrade database
>     where this property is not set?  Could you say what you plan to do
>     in both the hard and soft upgrade cases?
> 
>     I was assuming that only new databases would be affected and that
>     somehow new code would just work on existing databases with no upgrade
>     changes at all.  So something like no collation property at all
>     would be interpreted as UCS_BASIC.  And of course old format SYSCOLUMN
>     entries would be valid as well as old format conglomerate store
>     metadata.
>      >
>      >
>      > On 3/20/07, *Mamta Satoor* <msatoor@gmail.com
>     <mailto:msatoor@gmail.com>
>      > <mailto: msatoor@gmail.com <mailto:msatoor@gmail.com>>> wrote:
>      >
>      >     Thanks, Mike and Dan for your responses. Based on this and
>     following
>      >     from Dan's first mail in this thread
>      >     ******start of part of Dan's first mail in this thread*******
>      >     - basic database sets the locale for the DataValueFactory
>     after it
>      >     boots it, using a new method on DVF
>      >             void setLocale(Locale locale);
>      >     ******end of part of Dan's first mail in this thread*******
>     I may have missed this, is locale information already available from
>     from services.properties ?  For the store boot issue store will provide
>     format id and collation id, but I believe you need locale information
>     to determine the RuleBasedCollator and it can't depend on anything in
>     the property conglomerate.
> 
>      >     we donot need the collation attribute information at the DVF boot
>      >     time. It is sufficient to have locale info set on DVF at the
>     boot
>      >     time using setLocale method by basic database. If store code
>     calls
>      >     DVF to give proper DVD using formatid and collation type, DVF can
>      >     determine the correct RuleBasedCollator using the locale if the
>      >     collation type is territory based. So, DVF has everything it
>     needs
>      >     to find the correct RuleBasedCollator for given collation type.
>      >
>      >     I will go ahead and remove the following requirement from
>      >     Outstanding items under
>      >    
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>
>      >    
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>>
>      >     1)Add jdbc url attribute COLLATION into services.properties as
>      >     derby.database.collation property. If no COLLATION is
>     specified at
>      >     database create time, then have UCS_BASIC as the value for
>      >     derby.database.collation We need the property in the
>      >     services.properties rather than properties conglomerate because
>      >     DataValueFactory <
>     http://wiki.apache.org/db-derby/DataValueFactory>
>      >     needs this property before store has been booted completely.
>      >
>      >     In addition, I will add an entry as follows under Implemented
>     Items
>      >     on
>      >    
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>
>      >     <
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>>
>      >     At the time of database create time, optional JDBC url attribute
>      >     COLLATION is validated by the boot code in data dictionary
>     and the
>      >     validated value of COLLATION(if none specified by user, then
>     it will
>      >     default to UCS_BASIC which is also the only collation
>     available on
>      >     pre-10.3 databases) attribute is saved as
>     derby.database.collation
>      >     property in the properties conglomerate. This work was done by
>      >     revision 511283
>      >
>      >     As always, any feedback is welcomed,
>      >     Mamta
>      >
>      >     On 3/20/07, *Mike Matrigali* < mikem_app@sbcglobal.net
>     <mailto:mikem_app@sbcglobal.net>
>      >     <mailto:mikem_app@sbcglobal.net
>     <mailto:mikem_app@sbcglobal.net>>> wrote:
>      >
>      >
>      >
>      >         Mamta Satoor wrote:
>      >>  Mike, I am not sure if your question, about how in store DVD with
>      >>  correction collation type is loaded, was answered or not. In
>      >         other
>      >>  words, you had question about following piece of pseudo code
>      >         from Dan
>      >>      if (dvd instanceof StringDataValue)
>      >>              dvd = dvd.getValue(dvf.getCharacterCollator(type));
>      >>
>      >>  Let me attempt to answer it. It will help clear up things in
>      >         my mind too
>      >>  and make sure that I am understanding this correctly.
>      >>
>      >>  Currently,
>      >>
>      >        
>     derby.impl.dtore.access.conglomerate.OpenConglomerateScratchSpace
>      >         has
>      >>  get_row_for_export which first gets a class template row using
>      >>  RowUtil.newClassInfoTemplate This method in RowUtil calls
>      >>  Monitor.classFromIdentifier to get the InstanceGetter for each
>      >         of the
>      >>  format ids identified by store. Once
>      >>  OpenConglomerateScratchSpace.get_row_for_export has the class
>      >         template
>      >>  row, it will call RowUtil.newRowFromClassInfoTemplate . This is the
>      >>  method, Dan is proposing to modify, ie store should pass an
>      >         additional
>      >>  array of int to  RowUtil.newRowFromClassInfoTemplate which
>      >         will have the
>      >>  collation type associated with the formatids of the template row.
>      >>  RowUtil.newRowFromClassInfoTemplate will first get the DVD as
>      >         it does
>      >>  today using following
>      >>                     columns[column_index] =
>      >>  (DataValueDescriptor)
>      >         classinfo_template[column_index].getNewInstance();
>      >>  In addition, it will need to do something like following
>      >>      if (columns[column_index] instanceof StringDataValue)
>      >>              dvd =
>      >>
>      >         columns[column_index].getValue(
>     dvf.getCharacterCollator(collationTypesForTemplateRows[column_index]));
>      >
>      >         My opinion is that this work should be done in the datavalue
>      >         factory and
>      >         not outside.  Dan suggested at one point that some of the
>     work of
>      >         generating classes/instances should move from Monitor to
>      >         datavalue factory.
>      >
>      >         So I was assuming something like RowUtil.newClassInfoTemplate
>      >         instead
>      >         of calling Monitor.classFromIdentifier(format_ids[i]) get an
>      >         array of
>      >         InstanceGetter's, it would call something like
>      >         datavaluefactory.classFromIdentifier(format_ids[i],
>      >         collator_ids[i]) -
>      >         then every InstanceGetter would produce the right type with
>      >         collator set
>      >         from then on.
>      >
>      >
>      >         Internal to dvf it can do the work of checking for
>     instanceof if it
>      >         needs to, but because it is inside dvf maybe it can do
>     something
>      >         smarter .
>      >>
>      >>  Dan, let me know if I understood you right. This will help me
>      >         answer
>      >>  your question on the Derby wiki page
>      >>
>      >        
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>      >         <
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>     <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>>
>      >>  <
>      >        
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>      >         <
>     http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>>
>      >         I
>      >>  know that we don't need to get into the implementation code
>      >         details in
>      >>  the design phase, but I need to be able to picture this
>      >         particular case
>      >>  in my mind to understand where I am going.
>      >>
>      >>  thanks,
>      >>  Mamta
>      >>
>      >>
>      >>  On 3/15/07, *Mike Matrigali* < mikem_app@sbcglobal.net
>     <mailto:mikem_app@sbcglobal.net>
>      >         <mailto:mikem_app@sbcglobal.net
>     <mailto:mikem_app@sbcglobal.net>>
>      >>  <mailto: mikem_app@sbcglobal.net <mailto:mikem_app@sbcglobal.net>
>      >         <mailto: mikem_app@sbcglobal.net
>     <mailto:mikem_app@sbcglobal.net>>>> wrote:
>      >>
>      >>
>      >>
>      >>     Daniel John Debrunner wrote:
>      >>      > Mamta Satoor wrote:
>      >>      >
>      >>     ...
>      >>
>      >>      >
>      >>      > - At recovery time the btree uses the collation type and
>      >         the data
>      >>     value
>      >>      > factory to setup its template row array correctly.
>      >         Something like
>      >>      >      for each dvd in row array
>      >>      >         if (dvd instanceof StringDataValue)
>      >>      >              dvd = dvd.getValue(dvf.getCharacterCollator
>      >         (type));
>      >>
>      >>     Note that the store issue is not just a recovery time
>      >         issue, templates
>      >>     are required during normal runtime.  Creation of these
>      >         templates used
>      >>     to show up (a long time ago) in performance analysis and
>      >         work was done
>      >>     to optimize the performance.  So I am interested in making
>      >         these
>      >>     template creations as efficient as possible.
>      >>
>      >>     Your proposal above does not look right to me - it could
>      >         just be I don't
>      >>     understand where the psuedo code is.  The code I expect in
>      >         store would
>      >>     be something like below - letting the datafactory do
>      >         whatever is right
>      >>     based on the format id and the collation, if store is going
>      >         to "own"
>      >>     knowing
>      >>     the collation of a given column then I would expect
>      >         something like:
>      >>
>      >>     for each format id in row array
>      >>         dvd = datavaluefactory.getObject(format id,
>      >         character_collator_type)
>      >>
>      >>     note this means extra overhead for every object creation in
>      >         the
>      >>     template.
>      >>
>      >>     To me it seems unfortunate to pass in this info per column,
>      >         when at
>      >>     least in 10.3 the current code it is one per database.  I
>      >         saw the
>      >>     direction as:
>      >>
>      >>     o 10.3 only needs one collation per database so hide the
>      >         info in the
>      >>       datafactory, basically there is one DEFAULT collation per
>      >         database.
>      >>       Thus no need for second argument to
>      >         datavaluefactory.getObject ()
>      >>
>      >>     o future release needs to have different collations per
>      >         conglomerate,
>      >>       then at that time we can store a collator type per
>      >         conglomerate - we
>      >>       have mechanism today to upgrade on the fly.  If we want
>      >         to support
>      >>       adding a collation to an existing database I would
>      >         suggest continueing
>      >>       the DEFAULT collation concept with some magic number
>      >         representing
>      >>       DEFAULT db collation in the datavaluefactory.getObject ()
>      >         call - which
>      >>       would mean use db wide default rather than specify
>      >         specific one. For
>      >>       new databases we would not need default, we could at that
>      >         time
>      >>     specify
>      >>       one per conglomerate.
>      >>       At this point we either change all the
>      >         datavaluefactory.getObject()
>      >>       calls to have 2 args and support DEFAULT_VALUE as second
>      >         argument, or
>      >>       maybe support both 1 and 2 arg calls - not sure.
>      >>
>      >>     0 future future release needs to have different collations
>      >         per column,
>      >>       then at that time we can store a collator type per column
>      >         - we
>      >>     continue to have mechanism to upgrade on fly as long as we
>      >         can come up
>      >>     with a default value for old tables.  Same issues as above.
>      >>
>      >>
>      >>
>      >>      >
>      >>      > - setting the collation property remains in the data
>      >         dictionary
>      >>      >
>      >>      > - basic database sets the locale for the
>      >         DataValueFactory after
>      >>     it boots
>      >>      > it, using a new method on DVF
>      >>      >         void setLocale(Locale locale);
>      >>      >
>      >>      > I think approaching the problem this way will lead to a
>      >         cleaner
>      >>     solution
>      >>      > in the long term and be somewhat easier to implement.
>      >>      >
>      >>      > Thanks,
>      >>      > Dan.
>      >>      >
>      >>      >
>      >>      >
>      >>      >
>      >>      >
>      >>      >
>      >>
>      >>
>      >
>      >
>      >
> 
> 


Mime
View raw message