db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: how should store get an object based on format id and collation id?
Date Mon, 16 Apr 2007 16:19:10 GMT
Below as quoted by Mamta are my views on this.  I was hoping that
compares involving collation chars would not require twice the
number of objects being created.

Below describes how store uses InstanceGetter currently to optimize
allocation of objects.  I was hoping to preserve this performance
for current non-collation datatypes and also to avoid needing to
provide any additional collation information after the initial
dvf.instanceGetterFromIdentifiers call.

Just so I know where we are, Dan do you have a problem with the
proposed interfaces, ie. are they in the right place and taking
the right arguments?  If so maybe we could incrementally implement
the interfaces so that I could continue the store side while the
implmentation discussion continues.  I would be ok with an initial
interface change that only supported current collation, so that
I could at least verify the store changes.

Mamta, are you close to an implmentation, maybe you could post a patch 
so that I could work off of that while discussion continues?

Mamta Satoor wrote:
> Hi Dan,
>  
> Here are my attempts to answers your questions.
>  
> "Why use InstanceGetter here?" Because Store wants to call the 
> InstanceGetter once and call getInstance on them multiple times. This is 
> for efficiency reasons. This is what is currently done but through 
> interfaces on Monitor rather than DVF. Mike, maybe you can share your 
> thoughts too on why Store does this.
> 
>  
> "It doesn't have to return another DVD, it can return itself if it is of
> the correct type, thus no additional overhead for UCS_BASIC collation.
> Thus this switch would happen once for the first collation, not every
> collation, and of course not happen at all if no collation is involved."
> I agree, but with InstanceGetter approach, it doesn't even have to 
> happen once because we will be generating the right DVD in first place.
>  
> "Could you show an example of how the store will be calling the code you
> are describing? Maybe that would help me out."
> Store would call something like following(this is copied from what Mike 
> wrote in this same thread, dated April 12th, 2nd mail from Mike, point 
> 3.) Again, Mike if you have more to add from the Store point of view, 
> please do so.
> 
>    Store will call following once
>    InstanceGetter = dvf.instanceGetterFromIdentifiers(format id, 
> collation id)
> 
>    Store will call following many times:
>    dvd = InstanceGetter.getNewInstance()
>  
> The reason for doing it this way is explained by Mike below
>  
> "3) optimized allocation, caching some of the work.  This is used
>    where one query may generate large number of rows - for instance
>    hash table scan and sorter calls.  Here the idea is to do some
>    part of the work once leaving an InstanceGetter which then can
>    repeatedly give back new objects in the most optimized way:
> 
> again at this point dvd can be used to correctly compare against other
>      dvd's in possible collate specific ways."
> 
> thanks,
> Mamta
> On 4/14/07, *Daniel John Debrunner* <djd@apache.org 
> <mailto:djd@apache.org>> wrote:
> 
>     Mamta Satoor wrote:
>      > Hi Dan,
>      >
>      > The problem we are trying to solve is provide a way to Store so
>     that it
>      > can call a method (say it's called
>      > getInstanceGetterForFormatIDandCollationType) on DVF with format id &
>      > collation type and get an InstanceGetter for that combination.
> 
>     Why use InstanceGetter here?
> 
>      > Like Mike
>      > mentioned in his earlier mail (in this same thread, dated April 12th,
>      > 2nd mail from Mike) with point 3), Store will call this method
>     once and
>      > call getInstance on that InstanceGetter multiple times to get the
>     right
>      > DVD. If we don't change the InstanceGetter as I suggested, then that
>      > would mean that we will be creating 2 DVD objects for every character
>      > DVD through Store code. The worst part is we will be doing this
>      > unnecessary creation of 2 DVDs even for databases which want default
>      > collation. The 2 DVD creation I am talking about are first, through
>      > InstanceGetter, we will get say SQLChar. Then at the time of actual
>      > collation comparison, it will have to call something like
>      > StringDataValue.getCollationValue(int collationType) to get
>     another DVD
>      > to make sure that the collation is being performed with write DVD.
> 
>     It doesn't have to return another DVD, it can return itself if it is of
>     the correct type, thus no additional overhead for UCS_BASIC collation.
>     Thus this switch would happen once for the first collation, not every
>     collation, and of course not happen at all if no collation is involved.
> 
>      > What I am suggesting does not make InstanceGetter complicated. It is
>      > pretty simple implementation. All I am proposing is to have special
>      > InstanceGetter class for collation sensitive DVDs. This new
>      > InstanceGetter class will have RuleBasedCollator (which will be
>     set the
>      > first time this InstanceGetter is created for the given database
>     through
>      > the DVF) and it will have collation type(this collation type will
>     always
>      > be set to whatever collation type the
>      > getInstanceGetterForFormatIDandCollationType was called with. This
>      > collation type will determine which kind of DVD to generate ie
>     one with
>      > default collation or one with terriotry based collation). You
>     mentioned
>      > in your mail that "I got a little lost in the details". Please let me
>      > know where it was unclear and I can try to explain it better.
> 
>     Could you show an example of how the store will be calling the code you
>     are describing? Maybe that would help me out.
> 
>      >
>      > As for your question about "does it take account of the fact that
>     the
>      > registered format ids are system wide and there can be databases with
>      > different default collations in the same system?" My understanding is
>      > that there is one DVF per database and these InstanceGetters will be
>      > saved on DVF and hence I do not forsee any problems in having
>     multiple
>      > databases with different collations in same Derby system.
> 
>     Dan.
> 
> 


Mime
View raw message