db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-465) Embedded Derby-PointBase comparison
Date Tue, 19 Jul 2005 18:06:48 GMT
    [ http://issues.apache.org/jira/browse/DERBY-465?page=comments#action_12316139 ] 

Mike Matrigali commented on DERBY-465:
--------------------------------------

It looks like indexing has resolved the unexpected performance problem as the db grows.  
As you measured without indexes, derby as implemented has no option to perform updates/deletes
other than scanning the entire table.  As you measured the time to this scan is as expected
linearly
related to the size of the table.

As you measured with indexes the performance of theses updates/deletes is relatively stable

from 0 to 5 million rows.

The addition of each index is expected to slightly increase the cost of each insert.  In derby
each
index is implemented internally as another table, so a single insert into a table with one
index
internally turns into an insert into the base table and an insert into an btree table.  Again
it is hard
to guess how much each should take without seeing the sample data.  It looks like the table
structure is (integer, varchar, integer, date, bigint, float) - but I don't see what the size
of the varchar
date is expected to be.

I am guessing that the variance that you describe in  derby-optimization.doc at each level
is again caused
by recompiling the query, subsequent to compiling this query once derby will then reuse the
query plan
for subsequent executions of the same query.  This compilation is necessary as it looks like
the test is
dropping and recreating the table at each of these points.  Once the table is dropped the
derby dependency
system will automatically notice that the current query plan is invalid and will recompile
the query even
if you recreate the table with the same name and same columns.  This is the same effect you
noted in
your original document.

Your results with the page cache are puzzling.  There are definite problems is you make the
cache too big for 
the OS/JVM to support and paging happens but the sizes picked don't seem to be the case. 
Note that
your choices for pageSize are incorrect.  There are only 4 valid page sizes :Defines the page
size, in bytes, for on-disk database pages for tables or indexes used during table or index
creation. Page size can only be one the following values: 4096, 8192, 16384, or 32768. Set
this property prior to issuing the CREATE TABLE or CREATE INDEX statement. This value will
be used for the lifetime of the newly created conglomerates.

http://incubator.apache.org/derby/docs/10.0/manuals/tuning/perf87.html#IDX561

Again without knowing the expected size of your rows, it is hard to recommend a page size
change.  For this test there
are probably 2 significant cache points:
1) fit whole table/index into memory - as you add rows this becomes unreasonable
2) fit index into memory - my guess is that index fits into memory up to somewhere between
3 and 4 million rows.  Assuming integer
     key I used about 8 bytes per index entry as an estimate.

On space usage, again it is hard to know what is appropriate without seeing the queries. 
But if your application does not expect 
to update the data such that it grows in size, then overall space in the database can be saved
by setting the reserved space option to 0, 
see:
http://incubator.apache.org/derby/docs/10.0/manuals/tuning/perf86.html#HDRSII-PROPER-28026

I also suggest that you include the version of Derby that you are using in your documents.
     

> Embedded Derby-PointBase comparison
> -----------------------------------
>
>          Key: DERBY-465
>          URL: http://issues.apache.org/jira/browse/DERBY-465
>      Project: Derby
>         Type: Wish
>   Components: Test
>     Versions: 10.0.2.1, 10.0.2.0
>  Environment: Windows Server 2003, 4 processors, summary CPU 3.00 Ghz, RAM 1 Gb
>     Reporter: Peter Kovgan
>  Attachments: Benchmarks_info_independent.doc, derby-optimization.doc
>
> I have tested 4 major embedded DB.
> I have found that major disadvantage of Derby is 
> 1)low insert speed and 
> 2)significant performance degradation in select, update, delete  operation speed starting
from some table size.
> PointBase in comparison has not such degradation.
> It will be better if you improve your product.
> Good luck and thank you.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message