Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 38007 invoked from network); 21 Sep 2004 23:46:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 21 Sep 2004 23:46:34 -0000 Received: (qmail 80138 invoked by uid 500); 21 Sep 2004 23:46:33 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 80092 invoked by uid 500); 21 Sep 2004 23:46:33 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 80081 invoked by uid 99); 21 Sep 2004 23:46:33 -0000 X-ASF-Spam-Status: No, hits=1.6 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [32.97.110.130] (HELO e32.co.us.ibm.com) (32.97.110.130) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 21 Sep 2004 16:46:32 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id i8LNkUbZ474214 for ; Tue, 21 Sep 2004 19:46:30 -0400 Received: from [9.72.138.187] (dyn9-72-138-187.ibmus2.ibm.com [9.72.138.187]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i8LNkUGc268658 for ; Tue, 21 Sep 2004 17:46:30 -0600 Message-ID: <4150BD23.1040804@sbcglobal.net> Date: Tue, 21 Sep 2004 16:45:39 -0700 From: Mike Matrigali User-Agent: Mozilla Thunderbird 0.7.2 (Windows/20040707) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Derby Discussion Subject: Re: Derby performance and data volume References: <58D63E44CFC6D3119C8C009027E02DFD06B679E4@tlvmail.enigma.co.il> In-Reply-To: <58D63E44CFC6D3119C8C009027E02DFD06B679E4@tlvmail.enigma.co.il> X-Enigmail-Version: 0.85.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N some comments on max data capacity: derby stores each base table and each index in a single file, so the data limit size is mostly whatever the filesize limit is on the JVM/OS on which you are running. Early on there were many JVM and/or OS's that limited file size to 2 gig, but I believe most windows/linux/unix implementations have larger limits now. Derby is coded against the java 64 bit interfaces to access these files, so internally should absolute table maximum size is something like 2**64 (including internal overhead in addition to user data). Also derby requires all data to be located logically to the JVM on a single disk. To spread the database across multiple disks one must configure the underlying hardware or software to make multiple disks look like one disk to the JVM. BLOB/CLOB datatypes are stored in line in the the same file as the other columns of the table, so count against the above limits. The number of tables or indexes is not limited, other than the id's for the tables/indexes are 64 bit numbers, so you can have something like 2**64 total indexes/tables. These limits have not been tested, I believe tables of a gigabyte or so have been used regularly by customers. While derby can theoretically support tables 2**64, it does not have features one might expect from a VLDB database (all ddl including index creation is a single threaded offline operation, all backup/restore is at the database unit level, only btree indexes are supported (no compressed indexes, no bit mapped), creating an index requires approximately twice the size of the index as free space to execute the create, i am sure I am missing more ...). Little work was done in the past with an eye to very large db's so this may be an area ripe for contributions in the future, I would especially like to see requirements from those looking to use derby for their large database needs. some comments on performance: Derby should be able similarly to other mainstream databases for standard SQL operations, if some care is taken by application writers. It's underlying index/disk scheme is similar to many other databases, It maintains a data cache so frequently accessed pages are accessed quickly while not requiring the entire db to be in memory. A couple of areas to watch for: 1) optimization/comiplation of queries in derby is relatively costly, most work to this point has been in making execution of already compiled queries perform better. The assumption has been that it is ok to take time optimizing/compiling the query, and then the result will be cached and reexecution of the query the next time will pay no optimization/compilation cost. As discussed in another thread queries will perform better if you use the jdbc parameterized queries wherever possible (ie. insert into foo values (?, ?, ?) rather than insert into foo values (1, 2, 3). 2) As a 100% pure java program initial execution of any java code probably will not perform as well as an equivalent "c" coded program. But if the workload lends itself to reexecution of the code paths, then modern JIT's will likely automatically compile the critical code paths into machine code. This means that benchmarks that do a single insert and measure results will likely see very large improvements if they measure the 1000th iteration instead. 3) Watch out for autocommit. Derby will execute a synchronous I/O for every commit action, in order to guarantee recoverability of transactions. Derby jdbc programs are automatically in autocommit mode. This means that often very simple iterations programs become quickly I/O bound by the log while using almost no cpu on modern processors. By grouping multiple update statements in a single operation one can see throughput increase by 2 orders of magnitude (ie. 100 inserts/second go to 10000/sec). Be aware that some database's don't sync their log at commit time in their default configuration. It would be nice to run some open source benchmarking on derby. I must admit I don't know of much available in this area, can anyone recommend any open source benchmarks? From recent threads it seems like others are already testing out the performance of derby, I hope they continue to post their results so that others can benefit. David Zonsheine wrote: > Hello All, > > Can someone please elaborate on performance and max data capacity of Derby? > > Thank you very much, > > *David Zonsheine* > > Manager, System Integration & Development PS&C > > Enigma > > Tel: +972-9-9569955 ext. 309 > > Fax: +972-9-9560474 > > Mobile: +972-54-6658784 > > mailto:DavidZ@enigma.com > > _http://www.enigma.com _ > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom > they are addressed. If you have received this email in error please > notify the originator of the message. > Scanning of this message is performed by SurfControl E-mail Filter > software in conjunction with virus detection software.