From djd@apache.org Fri Feb 18 17:49:20 2005 Return-Path: Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 36715 invoked by uid 500); 18 Feb 2005 17:49:20 -0000 Delivered-To: apmail-incubator-derby-cvs@incubator.apache.org Received: (qmail 36707 invoked by uid 99); 18 Feb 2005 17:49:20 -0000 X-ASF-Spam-Status: No, hits=-9.8 required=10.0 tests=ALL_TRUSTED,NO_REAL_NAME X-Spam-Check-By: apache.org Received: from minotaur.apache.org (HELO minotaur.apache.org) (209.237.227.194) by apache.org (qpsmtpd/0.28) with SMTP; Fri, 18 Feb 2005 09:49:20 -0800 Received: (qmail 11538 invoked by uid 65534); 18 Feb 2005 17:49:19 -0000 Message-ID: <20050218174919.11537.qmail@minotaur.apache.org> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: svnmailer-1.0.0-dev Date: Fri, 18 Feb 2005 17:49:18 -0000 Subject: svn commit: r154330 - incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/package.html To: derby-cvs@incubator.apache.org From: djd@apache.org X-Virus-Checked: Checked Author: djd Date: Fri Feb 18 09:49:16 2005 New Revision: 154330 URL: http://svn.apache.org/viewcvs?view=3Drev&rev=3D154330 Log: Add information on type system to iapi.types package.html Added: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/pack= age.html (with props) Added: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/p= ackage.html URL: http://svn.apache.org/viewcvs/incubator/derby/code/trunk/java/engine/o= rg/apache/derby/iapi/types/package.html?view=3Dauto&rev=3D154330 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D --- incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/pack= age.html (added) +++ incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/pack= age.html Fri Feb 18 09:49:16 2005 @@ -0,0 +1,342 @@ + + + + + Derby Type System + + +

Derby Type System

+The Derby type system is mainly contained in the org.apache.derby.iapi.types +package. The main two classes are = DataValueDescriptor +and +DataTypeDescriptor.
+

DataValueDescriptor

+Values in Derby are always represented by instances of org.apache.derby.iapi.types.DataValueDe= scriptor, +which might have +been better named DataValue. +DataValueDescriptor, or +DVD for short, is +mainly used to represent SQL data values, though it is used for other +internal types.  DataValueDes= criptor +is a Java  +interface and in general all values are manipulated through interfaces +and not the Java class implementations such as SQLInteger. +DVDs are mutable +(their value can change) and can represent NULL or a valid value. Note +that SQL NULL is +represented by a DataValueDescript= or +with a state of NULL, +not a null Java +reference to a DataValueDescriptor= .
+
+Generally the Derby +engine works upon an array of DVD's that represent a row, which can +correspond to a row in a table, a row in a ResultSet to be returned to +the application or an intermediate row in a query. The DVD's within +this array are re-used for each row processed, this is why they are +mutable. For example in  reading rows from the store a single DVD +is used to read a column's value for all the rows processed. This is to +benefit performance, thus in a table scan of one million rows Derby +does not create one million objects, which would be the case if the +type system was immutable, like the Java object wrappers +java.lang.Integer +etc.
+
+The methods in DataValueDescriptor +can be broken into +these groups
+
+ + +

Type Specific +Interfaces

+To support operators specific to a type, or set of types, Java +interfaces that extend DataValueDe= scriptor exist:
+
+ +

Language Compilation

+Much of the generate code for language involves the type system. E.g. +SQL operators are converted to method calls on interfaces within the +type system, such as DataValueDesc= iptor +or NumberDataValue. Thus all +this generated code makes method calls through interface method calls. +The language has a policy/style of generating fields with holder +objects for the result of any operation. This holder +DataValueDescriptor is +then re-used for all the operations within that +query execution, thus saving object creation when the operation is +called on multiple rows. The generated code does not create the initial +value for the field, instead the operator method or DataValueFactory +methods create instance the first time that the result is passed +in as +null. The approximate Java +code for this would be (note the generator +generates to byte code directly).
+
+   // instance field to +hold the result of the minus
+   private +NumberDataValue f7;
+
+     &nbs= p; +=2E..
+
+    // code within +a generated method
+   f7 =3D value.minus(f7= );
+
+

+

Interaction with +Store

+The store knows little about how values represent themselves in bytes, +all that knowledge is contained within the DVD implementation.
+The exception is SQL NULL h= andling, +the store handles NULL valu= es +consistently, as a null bit in the status byte for a field. Thus +readExternal and writeExternal are never called +for a +DataValueDescriptor that +is NULL.
+

Delayed Object Creation

+When a value reads itself from its byte representation it is required +that the least amount of work is performed to obtain a useful +representation of a value. This is because the value being read from +disk may never be returned to the application, or returned but never +used by the application. The first case can occur when a qualification +in the SQL statement is executed at the language layer and not pushed +down to the store, thus the row is fetched from the store but filtered +out at the language layer. Taking = SQLDecimal +as an example, the byte +format is a representation of  a java.math.BigInteger instance +along with a scale. Taking the simple approach that SQLDecimal would +always use a java.math.BigDecimal, then this is the steps that would +occur when reading a DECIMAL column:
+
    +
  1. Read BigInteger format +into byte array, read scale
  2. +
  3. New BigInteger instan= ce +from byte array - 2 object creations +and byte array copy
  4. +
  5. New BigDecimal instan= ce +from BigInteger and scale - 1 object +creation
    +
  6. +
+Now think about a million row table scan with a DECIMAL column that +returns 1% of the rows to the application, filtering at the language +layer.
+
+This simple SQLDecimal impl= ementation +will create 3 million objects and +do 1 million byte array copies.
+
+The smart (and current) implementation of SQLDecimal will delay steps 2 +and 3 until there is an actual need for a BigDecimal object, e.g when +the application calls ResultSet.ge= tBigDecimal. +So assuming the +application calls getBigDecimal for +every row it receives, then, since +only 1% of the rows are returned, 30,000 objects are created and 10,000 +byte copies are made, thus saving 2,970,000 object creations and +990,000 byte array copies and the garbage collection overhead of those +short lived objects.
+
+This delayed object creation increases the complexity of the +DataValueDescriptor impleme= ntation, +but the performance benefit is well +worth it. The complexity comes from the implementation maintaining dual +state, in SQLDecimal case +the value is represented by either the raw +value, or by a BigDecimal o= bject. +Care is taken in the implementation +to always access the value through methods, and not the fields +directly. String based values such as SQLChar also perform this +delayed +object creation to String, as creating a String object requires two +object creations and a char array copy. In the case of SQLChar though, +the raw value is maintained as a char array and not a byte array, this +is because the char[] can be used as-is as the value, e.g. in string +comparisons.
+

DataValueFactory

+Specific instances of DataValueDes= criptor +are mostly created through +the DataValueFactory interf= ace. +This hides the implementation of types +from the JDBC, language and store layers. This interface includes +methods to:
+
    +
  • generate new NULL val= ues +for specific SQL types.
    +
  • +
  • generate specific types from Java primitves or Java objects (such +as String). The returned type corresponds to the JDBC mapping for the +Java type, e.g. SQLInteger = for + int. Where the Java +type can map to +multiple SQL types there are specific methods such as getChar, +getVarchar.
  • +
+

+

DataTypeDescriptor

+The SQL type of a column, value or expression is represented by an +instance of org.apache.derby.iapi.= types.DataTypeDescriptor. DataTypeDescriptor contains three key +pieces of information:
+
+
    +
  1. The fundamental SQL type, e.g. INTEGER, DECIMAL, represented by a +org.apache.derby.iapi.types.TypeId.
    +
  2. +
  3. Any length, precision or scale attributes, e.g. length for CHAR, precision & scale for + DECIMAL.
  4. +
  5. Is the type nullable
  6. +
+Note that a DataValueDescriptor is +not tied to any DataTypeDescriptor= , +thus setting a value into a DataVa= lueDescriptor +that does not conform to the intended DataTypeDescriptor is allowed. +The value is checked in an explict normalization phase. As an example, +an application can use setBigDecim= al() +to set 199.0 to a +parameter that is marked as being = DECIMAL(4,2). +Only on the execute phase will the out of range exception be raised.
+
    +
+

Issues

+

Interfaces or Classes

+Matching the interface type hierachy is a implementation (class) +hierachy complete with abstract types, for example DataType (again +badly named) is the abstract root for all implementations of +DataValueDescriptor, and NumberDataType for NumberDataValue. Code would +be smaller and faster if the interfaces were removed and the official +api became the public methods of these abstract classes. The work +involved here is fixing the code generation involving types, regular +java code would be compiled correctly with any change, but the +generated code needs to be change by hand, to change interface calls to +method calls. Any change like this should probably rename the abstract +classes to short descriptive names, liker DataValue and NumberValue.
+

DataValueFactory

+There is demonstrated need to hide the implementation of DECIMAL as +J2ME, J2SE and J2SE5 require different versions, thus a type +implementation factory is required. However it seems to be too generic +to have the ability to support different implementations of INTEGER, +BIGINT and some other fundemental types. Thus maybe the code could be +simplified to allow use of SQLInteger, SQLLong and others directly. At +least the SQL types that are implemented using Java primitives.
+

Result Holder Generation

+The dynamic creation of result holders (see language section) means +that all operators have to check for the result reference being passed +in being null, and if so create a new instance of the desired type. +This check seems inefficient as it will be performed once per +operation, again, imagine the million row query. In addition the field +that holds the result holder in the generated code is assigned each +time to the same value, inefficient. It seems that the code using the +type system, generated or coded, can set up the result holder at +initialization time, thus removing the need for the check and field +assignment, leading to faster smaller code.
+

NULL and operators

+The operators typically have to check for incoming NULL values and +assign the result to be NULL if any of the inputs are NULL. This +combined with the result holder generation issue leads to a lot of +duplicate code checking to see if the inputs are NULL. It's hard to +currently do this in a single method as the code needs to determine if +the inputs are NULL, generate a result holder and return two values (is +the result NULL and what is the result holder). Splitting the operator +methods into two would help as at least the NULL checks could be in the +super-class for all the types, rather than in each implementation. In +addition this would lead to the ability to generate to a more efficient +operator if the inputs are not nullable. E.g for the + operator there +could be plus() and plusNotNull() methods, the plus() being implemented in the +NumberDataType class, handling NULL inputs and calling plusNotNull(), +with the plusNotNull() implemented in the specific type.
+

Operators and self

+It seems the operator methods should almost always be acting on thier +own value, e.g. the plus() method should only take one input and the +result is the value of the receiver (self) added to the input. +Currently the plus takes two inputs and probably in most if not all +cases the left input is the receiver. The result would be smaller code +and possible faster, as the method calls on self would not be through +an interface.
+ + Propchange: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/ty= pes/package.html ---------------------------------------------------------------------------= --- svn:eol-style =3D native