Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 9542 invoked from network); 1 Nov 2005 17:20:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Nov 2005 17:19:59 -0000 Received: (qmail 44246 invoked by uid 500); 1 Nov 2005 17:19:56 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 44004 invoked by uid 500); 1 Nov 2005 17:19:55 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 43862 invoked by uid 99); 1 Nov 2005 17:19:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2005 09:19:54 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [32.97.110.154] (HELO e36.co.us.ibm.com) (32.97.110.154) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2005 09:19:49 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e36.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jA1HJVlE023512 for ; Tue, 1 Nov 2005 12:19:31 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jA1HKYDC449308 for ; Tue, 1 Nov 2005 10:20:35 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jA1HJV79026070 for ; Tue, 1 Nov 2005 10:19:31 -0700 Received: from [127.0.0.1] (DMCSDJDT41P.usca.ibm.com [9.72.133.77]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jA1HJUwB025991 for ; Tue, 1 Nov 2005 10:19:31 -0700 Message-ID: <4367A39E.2050507@debrunners.com> Date: Tue, 01 Nov 2005 09:19:26 -0800 From: Daniel John Debrunner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910 X-Accept-Language: en-us, en MIME-Version: 1.0 To: derby-dev@db.apache.org Subject: Re: Derby I/O issues during checkpointing References: In-Reply-To: X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N �ystein Gr�vlen wrote: > Some tests runs we have done show very long transaction response times > during checkpointing. This has been seen on several platforms. The > load is TPC-B like transactions and the write cache is turned off so > the system is I/O bound. There seems to be two major issues: Nice investigation, I think I have seen similar problms on Windows. > 1. Derby does checkpointing by writing all dirty pages by > RandomAccessFile.write() and then do file sync when the entire > cache has been scanned. When the page cache is large, the file > system buffer will overflow during checkpointing, and occasionally > the writes will take very long. I have observed single write > operations that took almost 12 seconds. What is even worse is that > during this period also read performance on other files can be very > bad. For example, reading an index page from disk can take close > to 10 seconds when the base table is checkpointed. Hence, > transactions are severely slowed down. > > I have managed to improve response times by flushing every file for > every 100th write. Is this something we should consider including > in the code? Do you have better suggestions? Sounds reasonable. > > 2. What makes thing even worse is that only a single thread can read a > page from a file at a time. (Note that Derby has one file per > table). This is because the implementation of RAFContainer.readPage > is as follow: > > synchronized (this) { // 'this' is a FileContainer, i.e. a file object > fileData.seek(pageOffset); // fileData is a RandomAccessFile > fileData.readFully(pageData, 0, pageSize); > } > > During checkpoint when I/O is slow this creates long queques of > readers. In my run with 20 clients, I observed read requests that > took more than 20 seconds. Hmmm, I think that code was written assuming the call would nat take that long! > > This behavior will also limit throughput and can partly explains > why I get low CPU utilization with 20 clients. All my TPCB-B > clients are serialized since most will need 1-2 disk accesses > (index leaf page and one page of the account table). > > Generally, in order to make the OS able to optimize I/O, one should > have many outstanding I/O calls at a time. (See Frederiksen, > Bonnet: "Getting Priorities Straight: Improving Linux Support for > Database I/O", VLDB 2005). > > I have attached a patch where I have introduced several file > descriptors (RandomAccessFile objects) per RAFContainer. These are > used for reading. The principle is that when all readers are busy, > a readPage request will create a new reader. (There is a maximum > number of readers.) With this patch, throughput was improved by > 50% on linux. The combination of this patch and the synching for > every 100th write, reduced maximum transaction response times with > 90%. Only concern would be number of open file descriptors as others have pointed out. Might want to scavenged open descriptors from containers that are no longer heavily used. > The patch is not ready for inclusion into Derby, but I would like > to here whether you think this is a viable approach. It seems like these changes are low risk and enable worthwhile performance increases without completely changing the i/o system. Such changes could then provide the performance that a full async re-write would have to better (or at least match). Dan.