db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Large multi-record insert performance
Date Fri, 16 Mar 2007 16:51:00 GMT


Sedillo, Derek (Mission Systems) wrote:
> Dyre,
> 
> The goal is to find the most efficient/optimized way to insert large
> amounts of data into Derby.  For example in working as an Oracle DBA I
> have discovered that I can bulk load data from ProC using an array of C
> Structures in one insert statement like this:
> 
> INSERT INTO SSJ 
> VALUES (:tmp_ssj_data);  // Where tmp_ssj_data is an array (100s or
> 1000s) of structured records 
> 
> This approach greatly enhances performance for large data inserts which
> we perform regularly. My original question is how can I do something
> similar with Derby.
> 
> While I realize there are 'basic' constructs for performing a task, they
> are not 'normally' optimal.  For example performing 1000 separate
> transactions is less efficient than one.  See Daniel's numbers below.
I assume that Daniel's non-batch numbers below are generated using
code similar to that at then end of this message, ie. set autocommit to
false, loop doing inserts, commit after some number of inserts.  Derby
is different than other db's when using it in embedded mode, some
optimizations necessary in client/server applications are not necessary
in Derby.

As you can see in the results Derby currently does not optimize the
batch processing in embedded so if you are embedded I would use code 
similar to
what is included below.  The biggest performance win is going to be
to insert as many rows as possible between each commit.  Each commit
will do a synchronous I/O and wait for it to hit disk, so on modern
processors you will quickly become I/O wait bound unless you make
commits big enough. "Big enough" depends on the data, for instance
below it looks like the optimal number may be bigger than 100000,
but the difference between 10k and 100k is not much (6250 rows/sec for
10k, and 6924 rows/sec).
> Along these lines I did some testing for our own application.
> 
> Cached PreparedStatement but one transaction per insert:
> 100 tags added in 399ms
> 100 tags removed in 160ms
> 1000 tags added in 1163ms
> 1000 tags removed in 873ms
> 10000 tags added in 6094ms
> 10000 tags removed in 6840ms
> 100000 tags added in 58563ms
> 100000 tags removed in 67342ms
> 
> All in one transaction using executeUpdate():
> 100 tags added in 274ms
> 100 tags removed in 70ms
> 1000 tags added in 299ms
> 1000 tags removed in 250ms
> 10000 tags added in 1605ms
> 10000 tags removed in 1500ms
> 100000 tags added in 14441ms
> 100000 tags removed in 19721ms
> 
> All in one transaction using addBatch()/executeBatch():
> 100 tags added in 290ms
> 100 tags removed in 76ms
> 1000 tags added in 316ms
> 1000 tags removed in 258ms
> 10000 tags added in 1621ms
> 10000 tags removed in 1927ms
> 100000 tags added in 14971ms
> 100000 tags removed in 19320ms 
> 
> - Derek
> 
> -----Original Message-----
> From: Dyre.Tjeldvoll@Sun.COM [mailto:Dyre.Tjeldvoll@Sun.COM] 
> Sent: Friday, March 16, 2007 9:00 AM
> To: Derby Discussion
> Subject: Re: Large multi-record insert performance
> 
> I haven't followed this thread closely, but is there a reason why you
> cannot do:
> 
> PreparedStatement ps =
> conn.prepareStatement("insert into T values (?, ?, ?)");
> 
> conn.setAutoCommit(false);
> for (int i = 0; i < 1000; ++i) {
>     ps.setX(1, <whatever>);
>     ps.setX(2, <whatever>); 
>     ps.setX(3, <whatever>);
>     ps.executeUpdate();
> }
> conn.commit();

this is the code I would start with for iterative inserts in embedded 
systems, with the loop as big as makes sense for the application.
> 
> ?
> 
> --
> dt
> 
> 


Mime
View raw message