jakarta-slide-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kumar, Ashok" <aku...@metatomix.com>
Subject Proposal for a more efficient Slide Database Design
Date Fri, 12 Apr 2002 20:14:31 GMT
Hi,

New Slide Database Store Proposal
---------------------------------

Here is a proposal for a more optimized database schema for the Slide
J2EE Database Store. The major changes to the current schema involve
adding numeric identifiers for URI's and other strings that were
previously used as keys. This allows for much faster lookups and the use
of foreign keys for referential integrity. Other changes include merging
and dropping some of the existing tables. All the existing slide
functionality was maintained.


Justification for the new schema
--------------------------------

Using the existing Slide Database store for storing and retrieving a
large number of documents is extremely inefficient. The primary reason
for this is the lack of indexing. Applying indexing to the existing
database schema definitely helps. However, due to the fact that the URI
field (being a large VARCHAR(800) datatype) is the main key linking most
of the tables, it makes a very inefficient index. 

In addition, most databases have a limit on index size. In tables like
CHILDREN, where the unique index would be a combination of the URI and
CHILD_URI fields, the index size would exceed most database limits.

Finally, duplicating the data in large fields like URI in almost all of
the database tables is an inefficient waist of space and simple
normalization of this field provides a dramatic improvement in this
area.


The New Schema
--------------

Here is an overview of the schema changes:

- Three new tables were created to hold numeric key mappings for URI,
BRANCH and LABEL 
- WORKING_REVISION ->   removed
- LATEST_REVISIONS and REVISION -> VERSION_HISTORY
- REVISION_CONTENT ->   VERSION_CONTENT
- REVISIONS        ->   VERSION
- BRANCHES 	       ->   VERSION_PREDS (predecessors)
- LABEL            ->   VERSION_LABELS
- OBJECT           ->  (same + URI key mapping)
- LOCKS            -> (same)
- LINKS            -> (same)
- CHILDREN         -> (same)
- PERMISSIONS      -> (same)
- PROPERTY 	       -> (same)

Plus the addition of indexing and foreign key references.

The following schema will work with MS SQLServer and Sybase.
(although with Sybase the user defined datatypes need to be changed to
map to Sybase datatype limits)
Other schema's for Oracle, DB2 and MySQL will follow.

<<see attached file: new_slide_schema.zip>>


New Schema Implementation
------------------------

In order not to modify the existing database stores we have created
J2EEStore2. This store can be configured to point to a different
database instance that reflects the new schema.

Source files:
J2EEStore2.java( same as the J2EEStore.java with the addition of
hashtables for URI_ID lookup)
J2EEDescriptorsStore2.java
J2EEContentStore2.java

<<see attached java source files : J2EEStore.zip>>


Tests Conducted
---------------

Configuration:

Latest version of slide (from CVS as of March 25) with J2EEStore2
Tomcat 4.04 
MS SQLServer as the database store.

For the test we created a collection structure /rdf/testcase/test1 and
in this collection stored 1800 files each with a size of 313 KB. 

>From the Browser (IE6) a request to view the contents of this collection
(/rdf/testcase/test1) triggers a GET to this collection, 


    All Test Results are in Milliseconds. (by tracking Database Logs)
    =================================================================
    Operation                  OLD Schema     New Schema
    -----------------------------------------------------------------
    Starting                         1933           1559

    Put(testcase/test1/xxfile)       2544           1578

    Get(testcase)                    2805           2103	

    Get(testcase/test1)            174616          98477


Conclusion
----------

We believe the this is a good start for a new database schema. The test
results already show a large improvement and we believe addition code
modifications will result in a much greater improvement. 

In the process of redesigning the database we uncovered a number of
referential errors that were fixed by the new schema. For example,
problems we previously had with deadlocks when storing / updating very
large files are no longer an issue. 

We are looking for feedback from this proposal.

Thank you...

Ashok


Mime
View raw message