Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 93835 invoked from network); 24 May 2006 09:36:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 24 May 2006 09:36:46 -0000 Received: (qmail 50240 invoked by uid 500); 24 May 2006 09:36:46 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 50025 invoked by uid 500); 24 May 2006 09:36:45 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 50016 invoked by uid 99); 24 May 2006 09:36:45 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 02:36:45 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [192.18.1.36] (HELO gmpea-pix-1.sun.com) (192.18.1.36) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 02:36:43 -0700 Received: from d1-emea-01.sun.com ([192.18.2.111]) by gmpea-pix-1.sun.com (8.12.9/8.12.9) with ESMTP id k4O9aI4x007025 for ; Wed, 24 May 2006 10:36:21 +0100 (BST) Received: from conversion-daemon.d1-emea-01.sun.com by d1-emea-01.sun.com (Sun Java System Messaging Server 6.2-4.02 (built Sep 9 2005)) id <0IZR00L01JKXMT00@d1-emea-01.sun.com> (original mail from Oystein.Grovlen@Sun.COM) for derby-dev@db.apache.org; Wed, 24 May 2006 10:36:18 +0100 (BST) Received: from [129.159.112.239] by d1-emea-01.sun.com (Sun Java System Messaging Server 6.2-4.02 (built Sep 9 2005)) with ESMTPSA id <0IZR00KKVK0H3I00@d1-emea-01.sun.com> for derby-dev@db.apache.org; Wed, 24 May 2006 10:36:18 +0100 (BST) Date: Wed, 24 May 2006 11:36:16 +0200 From: Oystein Grovlen - Sun Norway Subject: Re: [jira] Created: (DERBY-801) Allow parallel access to data files. In-reply-to: <20060524073729.GA5933@stud.ntnu.no> Sender: Oystein.Grovlen@Sun.COM To: derby-dev@db.apache.org Message-id: <44742910.2020607@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <379656011.1136803530096.JavaMail.jira@ajax.apache.org> <20060524073729.GA5933@stud.ntnu.no> User-Agent: Thunderbird 1.5.0.2 (X11/20060427) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Anders Morken wrote: > I've played around a bit with a different approach - using the > FileChannel class from Java 1.4's new IO API. I've written a class > RAFContainer4 which extends RAFContainer and overrides the readPage and > writePage methods of that class to use read/write(ByteBuffer buf, long > postition) in FileChannel to access the container's file, without > synchronizing on the FileContainer during the read and write calls. > > With a bit of hackery in BaseDataFileFactory#newContainerObject() this > class is then used instead of the regular RAFContainer on creation of > new RAFContainer objects when Derby runs in a 1.4+ JVM. > > This approach gives the JVM and OS the opportunity to issue multiple > file operations concurrently, although we have no guarantees that this > will actually happen. This is JVM/OS dependent, but stracing the Sun > 1.4.2_09 VM on Linux 2.6 shows that the VM now uses pread64()/pwrite64() > system calls instead of seek(), read() and write(). pread and pwrite > have similar semantics to the FileChannel#read/write(ByteBuffer buf, > long position) methods, and do not alter the file's seek() position, and > are supposed to be thread safe. Great, Anders. This looks like a promising idea. > > Of course only people running Derby on 1.4+ JVMs will have the > opportunity to benefit from this approach. As support for 1.3 is to be > deprecated this might not be much of an issue? If this means that 1.3 still works, but the old way, I think this is acceptable. > But anyway, I would like to see if this hack of mine actually works. I > see mentions of a "TPC-B like benchmark" in the threads �ystein links to > above, and wonder if that is something Sun internal, or if it's a > publicly available benchmark implementation that I can get my grubby > little paws on and try out this patch with? =) The actual code is something we have developed internally here, and I am not sure we will have time to make it available any time soon. If you make a patch of your changes, I should be able to test this next week. If you want to try this out yourself, I think you should be able to make a sufficient test client quickly. (However, it will take some time to create the large database). What I used was: 1. A database much larger than physical memory on computer. I think I had around 17 GB of data including indexes. 2. A large page cache. I used 500MB on a computer with 2GB RAM. 3. Log device on separate disk. (I.e., you need a computer with 2 disks.) 4. I used TPC-B like transactions, but I would accept that any load where transactions access records in a large table by primary key should work. Make sure to try to avoid frequent lock conflicts or deadlocks. (E.g., two random accesses to the same table within a transaction is dead-lock prone) 5. Multi-threaded application where all threads ran the same type of transaction back-to-back. (I had 20 threads). Our application prints throughput per thread and total throughput for every 10 second interval and an average at the end. 6. Run for at least 30 mins to allow for several checkpoints to happen during your run. (I ran for 1 hour). A short description of our TPC-B like app: 4 tables: branch(bid int, bbal int, junk char(92), primary key(bid)) teller(tid int, bid, int , tbal int, junk char(88), primary key(tid)) account(aid int, bid int, abal int, junk char(88), primary key(aid)) history(aid int, tid int, bid int, delta int, tstamp timestamp, primary key(tstamp,aid,delta)) All primary keys are numbered from 0 to n-1, where n are the number of rows in the table. *bal columns are initially 0. teller: bid=tid/10, account:bid=aid/100000 I had 1000 branches, 10000 tellers and 100 million accounts. history table is initially empty A transaction has 5 statements: update account set abal = abal+? where aid=? and bid=? insert into history values (?, ?, ?, ?, CURRENT_TIMESTAMP) update teller set tbal = tbal+? where tid=? and bid=? update branch set bbal = bbal+? where bid=? select abal from account where aid=y In TPC-B there are some rules about selecting teller and account where a certain percentage of transactions will use a teller from a different branch than the branch of the account, but I do not think that will matter here. I suggest by random determining a balance change (we use a number between -1 million and 1 million), and a random aid and determine tid and bid based on the aid. (I.e., tid=aid/10000, bid=tid/100000). -- �ystein