db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Created: (DERBY-2168) Create new row format for derby to optimize access to columns within a row
Date Tue, 12 Dec 2006 17:53:20 GMT
Create new row format for derby to optimize access to columns within a row

                 Key: DERBY-2168
                 URL: http://issues.apache.org/jira/browse/DERBY-2168
             Project: Derby
          Issue Type: Improvement
          Components: Store
    Affects Versions:
            Reporter: Mike Matrigali
            Priority: Minor

The current (and only) low level row format for derby was chosen to at the beginning of the
project to be the most flexible.  So it treats every
column as variable length.  The simple row format is just a sequence of columns, with each
column having a header indicating how long it
is.  So there is  no way to determine where the N'th column is in the row unless it first
traverses the N-1 columns before
it.  A number of queries that might benefit from a different row format include:
1) non-covered queries which don't require all columns of data
2) non index scans which disqualify a number of rows based on a subset of columns that don't
happen to be the 1st N columns of the row.

A pretty standard row format would have some sort of table at the beginning which would allow
one to jump to a given offset of the row without
going through all the other columns.  Building up this table would likely increase the insert
cost slightly, and would increase the diskspace required
to store rows.

Another standard kind of row format would be to optimize the  storage of fixed length fields.
 Currently the store does not know anything about fixed
length fields as each datatype controls it's own storage.  New interfaces could be added either
at create time or maybe in the datatypes themselves
to export the knowledge that datatypes are fixed length.  

This is a big project.  Note that a lot of performance work in StoredPage has made it "know"
about the current record and field formats, as it was 
a big performance hit to make class calls for every field traversal.  This means that adding
a new record and/or field format is not as isolated as
one might hope.  Also we are likely to need to support both the old and new format.  Anyone
considering this work, I would suggest a very rough
prototype with peformance measurement first to make sure you are getting the expected performance
before  doing a lot of work.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message