db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Derby performance and data volume
Date Tue, 21 Sep 2004 23:45:39 GMT
some comments on max data capacity:

derby stores each base table and each index in a single file, so the
data limit size is mostly whatever the filesize limit is on the JVM/OS
on which you are running.  Early on there were many JVM and/or OS's that
limited file size to 2 gig, but I believe most windows/linux/unix
implementations have larger limits now.  Derby is coded against the java
64 bit interfaces to access these files, so internally should absolute
table maximum size is something like 2**64 (including internal overhead
in addition to user data).  Also derby requires all data to be located
logically to the JVM on a single disk.  To spread the database across
multiple disks one must configure the underlying hardware or software to
make multiple disks look like one disk to the JVM.

BLOB/CLOB datatypes are stored in line in the the same file as the other
columns of the table, so count against the above limits.

The number of tables or indexes is not limited, other than the id's for
the tables/indexes are 64 bit numbers, so you can have something like
2**64 total indexes/tables.

These limits have not been tested, I believe tables of a gigabyte or so
have been used regularly by customers.  While derby can theoretically
support tables 2**64, it does not have features one might expect from a
VLDB database (all ddl including index creation is a single threaded
offline operation, all backup/restore is at the database unit level,
only btree indexes are supported (no compressed indexes, no bit mapped),
creating an index requires approximately twice the size of the index as
free space to execute the create, i am sure I am missing more ...).

Little work was done in the past with an eye to very large db's so this
may be an area ripe for contributions in the future, I would especially
like to see requirements from those looking to use derby for their large
database needs.

some comments on performance:

Derby should be able similarly to other mainstream databases for
standard SQL operations, if some care is taken by application writers.
It's underlying index/disk scheme is similar to many other databases,
It maintains a data cache so frequently accessed pages are accessed
quickly while not requiring the entire db to be in memory.

A couple of areas to watch for:

1) optimization/comiplation of queries in derby is relatively costly,
most work to this point has been in making execution of already compiled
queries perform better.  The assumption has been that it is ok to take
time optimizing/compiling the query, and then the result will be cached
and reexecution of the query the next time will pay no
optimization/compilation cost.

As discussed in another thread queries will perform better if you use
the jdbc parameterized queries wherever possible (ie. insert into foo
values (?, ?, ?) rather than insert into foo values (1, 2, 3).

2) As a 100% pure java program initial execution of any java code
probably will not perform as well as an equivalent "c" coded program.
But if the workload lends itself to reexecution of the code paths, then
modern JIT's will likely automatically compile the critical code paths
into machine code.

This means that benchmarks that do a single insert and measure results
will likely see very large improvements if they measure the 1000th
iteration instead.

3) Watch out for autocommit.
Derby will execute a synchronous I/O for every commit action, in order
to guarantee recoverability of transactions.  Derby jdbc programs are
automatically in autocommit mode.  This means that often very simple
iterations programs become quickly I/O bound by the log while using
almost no cpu on modern processors.   By grouping multiple update
statements in a single operation one can see throughput increase by 2
orders of magnitude (ie. 100 inserts/second go to 10000/sec).

Be aware that some database's don't sync their log at commit time in
their default configuration.

It would be nice to run some open source benchmarking on derby.  I must
admit I don't know of much available in this area, can anyone recommend
any open source benchmarks?  From recent threads it seems like others
are already testing out the performance of derby, I hope they continue
to post their results so that others can benefit.

David Zonsheine wrote:

> Hello All,
> Can someone please elaborate on performance and max data capacity of Derby?
> Thank you very much,
> *David Zonsheine*
> Manager, System Integration & Development PS&C
> Enigma
> Tel: +972-9-9569955 ext. 309
> Fax: +972-9-9560474
> Mobile: +972-54-6658784
> mailto:DavidZ@enigma.com 
> _http://www.enigma.com <http://www.enigma.com/>_
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom
> they are addressed. If you have received this email in error please
> notify the originator of the message.
> Scanning of this message is performed by SurfControl E-mail Filter
> software in conjunction with virus detection software.

View raw message