From derby-user-return-10265-apmail-db-derby-user-archive=db.apache.org@db.apache.org Fri Dec 12 17:10:49 2008 Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 24233 invoked from network); 12 Dec 2008 17:10:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Dec 2008 17:10:48 -0000 Received: (qmail 98486 invoked by uid 500); 12 Dec 2008 17:11:00 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 98449 invoked by uid 500); 12 Dec 2008 17:11:00 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 98440 invoked by uid 99); 12 Dec 2008 17:11:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2008 09:11:00 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.18.43.133] (HELO sca-es-mail-2.sun.com) (192.18.43.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2008 17:10:38 +0000 Received: from fe-sfbay-09.sun.com ([192.18.43.129]) by sca-es-mail-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id mBCHAHDk013720 for ; Fri, 12 Dec 2008 09:10:17 -0800 (PST) Received: from conversion-daemon.fe-sfbay-09.sun.com by fe-sfbay-09.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) id <0KBR00201W2VUL00@fe-sfbay-09.sun.com> (original mail from Richard.Hillegas@Sun.COM) for derby-user@db.apache.org; Fri, 12 Dec 2008 09:10:17 -0800 (PST) Received: from richard-hillegas-computer.local ([129.150.17.160]) by fe-sfbay-09.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) with ESMTPSA id <0KBR00J1NX1559I0@fe-sfbay-09.sun.com> for derby-user@db.apache.org; Fri, 12 Dec 2008 09:10:17 -0800 (PST) Date: Fri, 12 Dec 2008 09:10:17 -0800 From: Rick Hillegas Subject: Re: faster inserts in big tables In-reply-to: Sender: Richard.Hillegas@Sun.COM To: Derby Discussion Message-id: <49429AF9.7050507@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-15 Content-transfer-encoding: 8BIT References: <1534204609.15814411227648807881.JavaMail.javamailuser@localhost> <20957504.post@talk.nabble.com> <4941499B.5090301@sun.com> <49418806.40100@sun.com> User-Agent: Thunderbird 2.0.0.18 (Macintosh/20081105) X-Virus-Checked: Checked by ClamAV on apache.org Knut Anders Hatlen wrote: > Rick Hillegas writes: > > >> Knut Anders Hatlen wrote: >> >>> Rick Hillegas writes: >>> >>> >>> >>>> Hi Brian, >>>> >>>> In a previous response Peter Ondruška noted that you could use bulk >>>> import to speed up your inserts if your source data lived in a >>>> properly formatted file. >>>> >>>> Even if your source data does not live in a file, you can still get >>>> bulk import speed (and the benefits of your generated key) by using >>>> table functions--provided that you upgrade to 10.4. If you wrap your >>>> source data in a table function, then you can bulk insert your data as >>>> follows: >>>> >>>> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N ) >>>> select * from table( MY_TABLE_FUNCTION() ) s >>>> >>>> >>> The internal SQL syntax allows you to add an optimizer override here >>> (--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up >>> even more, though I'm not sure exactly what it buys you. This override >>> is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT >>> system procedure, but you're not allowed to use it in your own SQL >>> queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does >>> exactly what Rick's example does, except that it also adds the optimizer >>> override. >>> >>> The procedure only works on old-style table functions, though. Does >>> anyone know what performance gains one can expect by using >>> SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be >>> worth the effort to create a similar mechanism for the new-style table >>> functions? >>> >>> >>> >> Hi Knut, >> >> It appears to me that this setting (insertMode=bulkInsert) is >> inspected when binding an INSERT statement. If the bind() logic finds >> this setting, then it escalates the lock mode for the target table to >> be a table level lock. Other than that, I cannot find any other use of >> this setting. Comments around this code, however, indicate that there >> might once have been other implications and that this setting, which >> is now unconditionally accepted, used to be silently rejected in >> certain situations, including insertion into a synchronized client >> database, insertion into a table which has triggers, and deferred mode >> insertion (see the comments on InsertNode.verifyBulkInsert). >> >> It appears to me that the benefit now conferred by this setting can be >> achieved by setting the lock mode on the table to be table level. So >> you would preface the INSERT statement above with the following >> statement: >> >> lock MY_TABLE in exclusive mode >> >> That, at least, is what I have gleaned from a code inspection. >> > > Thanks Rick, > > Did you also look at the code in InsertResultSet.open()? It looks like > it has some special handling of triggers, forreign keys and checking of > constraints when running in bulk insert mode. > > Hi Knut, Thanks for finding that. It looks like InsertResultSet optimizes the performance of bulk inserts as follows: 1) Claims to turn off row logging. That is, performs the insert via TransactionController.createAndLoadConglomerate() and recreateAndLoadConglomerate(). Perhaps you could take a look at those methods and verify what they really do. 2) Bulk-enforces foreign keys defined on the table. This is an optimized zippering together of the indexes which support a foreign/primary key pair. As I look at this code, however, it seems to me that you may actually get worse performance if you are bulk inserting (not replacing) a small number of rows into a table which already has a lot of rows in it. There is some additional trickiness dealing with the interaction of BEFORE triggers and CHECK constrains, but (famous last words) it appears to me that the processing is still row-by-row in the language layer so I don't see any optimization involved here. I guess the bottom line is that with bulk insert, you get better performance because logging is disabled. The bulk enforcement of foreign keys may improve or degrade your performance. So those are a few more clues. Cheers, -Rick