db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: faster inserts in big tables
Date Fri, 12 Dec 2008 17:10:17 GMT
Knut Anders Hatlen wrote:
> Rick Hillegas <Richard.Hillegas@Sun.COM> writes:
>
>   
>> Knut Anders Hatlen wrote:
>>     
>>> Rick Hillegas <Richard.Hillegas@Sun.COM> writes:
>>>
>>>   
>>>       
>>>> Hi Brian,
>>>>
>>>> In a previous response Peter Ondruška noted that you could use bulk
>>>> import to speed up your inserts if your source data lived in a
>>>> properly formatted file.
>>>>
>>>> Even if your source data does not live in a file, you can still get
>>>> bulk import speed (and the benefits of your generated key) by using
>>>> table functions--provided that you upgrade to 10.4. If you wrap your
>>>> source data in a table function, then you can bulk insert your data as
>>>> follows:
>>>>
>>>> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
>>>> select * from table( MY_TABLE_FUNCTION() ) s
>>>>     
>>>>         
>>> The internal SQL syntax allows you to add an optimizer override here
>>> (--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
>>> even more, though I'm not sure exactly what it buys you. This override
>>> is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
>>> system procedure, but you're not allowed to use it in your own SQL
>>> queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
>>> exactly what Rick's example does, except that it also adds the optimizer
>>> override.
>>>
>>> The procedure only works on old-style table functions, though. Does
>>> anyone know what performance gains one can expect by using
>>> SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
>>> worth the effort to create a similar mechanism for the new-style table
>>> functions?
>>>
>>>   
>>>       
>> Hi Knut,
>>
>> It appears to me that this setting (insertMode=bulkInsert) is
>> inspected when binding an INSERT statement. If the bind() logic finds
>> this setting, then it escalates the lock mode for the target table to
>> be a table level lock. Other than that, I cannot find any other use of
>> this setting. Comments around this code, however, indicate that there
>> might once have been other implications and that this setting, which
>> is now unconditionally accepted, used to be silently rejected in
>> certain situations, including insertion into a synchronized client
>> database, insertion into a table which has triggers, and deferred mode
>> insertion (see the comments on InsertNode.verifyBulkInsert).
>>
>> It appears to me that the benefit now conferred by this setting can be
>> achieved by setting the lock mode on the table to be table level. So
>> you would preface the INSERT statement above with the following
>> statement:
>>
>> lock MY_TABLE in exclusive mode
>>
>> That, at least, is what I have gleaned from a code inspection.
>>     
>
> Thanks Rick,
>
> Did you also look at the code in InsertResultSet.open()? It looks like
> it has some special handling of triggers, forreign keys and checking of
> constraints when running in bulk insert mode.
>
>   
Hi Knut,

Thanks for finding that. It looks like InsertResultSet optimizes the 
performance of bulk inserts as follows:

1) Claims to turn off row logging. That is, performs the insert via 
TransactionController.createAndLoadConglomerate() and 
recreateAndLoadConglomerate(). Perhaps you could take a look at those 
methods and verify what they really do.

2) Bulk-enforces foreign keys defined on the table. This is an optimized 
zippering together of the indexes which support a foreign/primary key 
pair. As I look at this code, however, it seems to me that you may 
actually get worse performance if you are bulk inserting (not replacing) 
a small number of rows into a table which already has a lot of rows in it.

There is some additional trickiness dealing with the interaction of 
BEFORE triggers and CHECK constrains, but (famous last words) it appears 
to me that the processing is still row-by-row in the language layer so I 
don't see any optimization involved here.

I guess the bottom line is that with bulk insert, you get better 
performance because logging is disabled. The bulk enforcement of foreign 
keys may improve or degrade your performance.

So those are a few more clues.

Cheers,
-Rick

Mime
View raw message