db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@debrunners.com>
Subject Re: Type system notes
Date Fri, 18 Feb 2005 20:08:21 GMT
RPost wrote:
> Thanks for the extended notes. These are very helpful.
> 
> 
>>"Daniel John Debrunner" wrote:
> 
> 
>>https://svn.apache.org/viewcvs.cgi/*checkout*/incubator/derby/code/trunk/ja
> 
> va/engine/org/apache/derby/iapi/types/package.html
> 
> 
>>Generally the Derby engine works upon an array of DVD's that represent a
> 
> row
> 
> Are these same DVDs also then used by the log module to create log records?

Not sure what this question is really asking, but there is only one
DataValueDescriptor interface. Some store classes are implementations of it.

>>Interaction with Store
> 
> What interaction is there with other modules (or functionality) such as
> log/restore/recovery or the catalog? Are external versions of type
> descriptors used to create the catalog descriptions of the columns? Used in
> metadata queries?

That would indeed be a good write up, deeper insight into the TypeId
side. Since I wasn't working in that area it wasn't fresh in my mind.

>>DataTypeDescriptor
>>Note that a DataValueDescriptor is not tied to any DataTypeDescriptor
> 
> 
> Is this for the same performance reasons given in the DataValueDescriptor
> section? There you said: 'For example in  reading rows from the store a
> single DVD is used to read a column's value for all the rows processed'. I
> assume that not tying the value and type descriptors together means that the
> value descriptors don't need to validate the type when being reused during
> reads from the store.

One reason is memory overhead, tieing a DVD to a DTD would mean each DVD
has am extra instance field that is the reference to the DTD. Another is
that these objects were previously written in network protocols and thus
needed to be created context free.
I think you are also correct in the performance assumption, as Derby can
avoid the normalization step if input and output types are compatible,
e.g. CHAR(5) to CHAR(10) does not need a length check.


>>Issues
>>Interfaces or Classes
>>Code would be smaller and faster if the interfaces were removed
> 
> 
> Do you have any sense or 'guesstimate' as to what the maximum potential size
> or speed savings could be?
> 
> Do you think this may be necessary (as opposed to desireable) for certain
> environments such as mobile or wireless?
> 
> Is it conceptually possible to design a 'proof of concept' that might
> provide at least an estimate of the savings that might be achieved? That is,
> is there any specific test case that might be useful to see if it is worth
> exploring further or would the changes be extensive even to perform a
> limited test?. Obviously the simpler the case the better.

I don't think it's necessary but it offends me :-). Looking at XP
practices, it would fall into the 'refactor' bucket.

I think it's actually fairly easy to test this. Write really simple
tests using direct creation of SQLIntegers, completely outside of the
engine. Work on plus, write a simple additon using plus that simulates
what the engine does, execute one million additions using

1) references through NumberDataValue
2) referenecs through NumberDataType
3+) various modified plus() methods as descibed in the type's package.html.

>>Result Holder Generation
>>The dynamic creation of result holders (see language section) means that
> 
> all operators have to check for the result reference being passed in being
> null, and if so create a new instance of the desired type
> 
> Could a result holder cache/factory be used effectively for this? Perhaps a
> separate thread that maintains a cache of new instances of various types.
> The size of the cache could be configurable by introducing a new property.
> This would allow the null checks to be removed from the operator code and
> the operator code would not have to wait synchronously for instance
> creation. Obviously there would be asynchronous waits since the cache would
> never be big enough for large numbers of rows.

I don't think a cache is a good idea, it's too much complexity. All I
was trying to say that the generated code could ensure the field was
initialized at statement initialization time, thus ensuring the field
was never null when the operator is executed. That removes the need in
the operators to check to see if the result holder is null, ie. their
api is defined as the result holder passed in must never be null, and
removes the need in the generated code to set the field with the return
(result) of the method call. A possible extra step is to define the
methods as void, as the caller already has the reference to the result.

Thanks for the questions!
Dan.



Mime
View raw message