db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject is there some way to support bulk insert without new syntax, in a zero admin way?
Date Tue, 31 Aug 2010 18:28:47 GMT
bulk insert used by import is actually just a hint to the
underlying system to use a different query execution technique for the
inserts.
Rather than add syntax and force users to direct the system it
would be nice as a zero admin db if the system could figure
out itself whether or not to do the work, when doing an insert
that results in more than one row.

The cost of the bulk insert execution is:
o it has to do a scan of the table to determine if the table
   is empty.  If it is not empty then none of the bulk insert
   optimization is possible, so none of the following work is
   done.
o It has to create a new table and new indexes for each of the
   existing indexes.
o it has to drop the old table and indexes on completion.
o it has to do system catalog updates to reflect the swapping of
   new table and indexes for old.
o it needs to force all datapages of the new table and indexes
   to disk and wait for all the I/O's to finish before returning.
o it benefits from not logging the actual data of all the inserts
   done.


If handling this in the optimizer, one could easily estimate if
there are rows in the table or not.  And only build a bulk insert
plan if there are none.

Also there should be some threshold on the number of rows being
inserted before choosing a bulk insert plan.  One really only
wants to do this if there is a lot of data being inserted into
an empty table.  In the case of insert ... as select from ...
the optimizer definitely has an estimate for the number of rows
being inserted.  I am not sure what state this estimate is in
for table functions.

If the system is not doing so already any time you do a create table as 
select we could be using the "bulk" optimization.   I don't know if
this is the case.  In this case one can avoid the extra system catalog 
overhead, and would argue that it is reasonable to assume a large amount
of data is being inserted so that "bulk" optimization is reasonable.


Mime
View raw message