db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: Inserting data into a database on a Derby network server
Date Mon, 08 Oct 2007 18:31:52 GMT
Bryan Pendleton wrote:
>> database. As the data are published scientific data I can share my 
>> source code
>> and the data. Everything is available on
>> www.fysik.dtu.dk/~munter/derby/
> 
> Thanks for sharing your sample program. It looks like you are creating
> statements, but never closing them.
> 
> Each call to cDerby.createStatement() should be matched with a call to 
> close()
> on the statement.

Ture,

As Bryan says, you should close your statements. Even better would be to 
use a PreparedStatement for all you queries (including those without any 
  variables).

Unless it is an application requirement, you could also do with just one 
table and use the (Prepared)Statement.getGeneratedKeys() if you need to 
obtain the unique identifier after insertion. Something like this:
   // Note that the first marker (?) is replaced with DEFAULT.
   dataInsert = cDerby.prepareStatement(
      "INSERT INTO Quaternary VALUES (
           DEFAULT, ?, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?)",
       Statement.RETURN_GENERATED_KEYS);
   ...
   affectedRows = dataInsert.executeUpdate();
   ResultSet rs = dataInsert.getGeneratedKeys();
   rs.next();
   int sysid = rs.getInt(1);
   rs.close();

Don't know how important it is anymore (with newer Java versions), but 
you might see a little improvement by using 'Float.valueOf(strings[3])' 
instead of 'new Float(strings[3].floatValue())'.

Further, you should not have to reconnect after a few hundred 
insertions. If you have to, it probably means one out of two things; the 
application code is not optimal, or there is a bug in Derby.
In this case, I *guess* that not closing the statements caused the heap 
to fill up. Particularly, the query is sorting the data with an ORDER 
BY. The excerpt of the following heap histogram tells the story:

num   #instances    #bytes  class name
--------------------------------------
   1:   1360547    54421880  org.apache.derby.impl.store.access.sort.Node
   2:       293     6282608  [Lorg.apache.derby.impl.store.access.sort.Node;
   3:     33641     3463176  <constMethodKlass>
   4:     33641     2696744  <methodKlass>
   5:      4919     2097344  [B
   6:     47123     1986192  <symbolKlass>
   7:      2547     1468640  <constantPoolKlass>

But, where are all the statement objects?
I think this case deserves a little more investigation to make sure we 
don't have a leak related to the Node-objects in Derby.
Can anyone shed some light on this based on previous experience?


Also, the reduced insertion rate can easily be observed with the repro.
The number of rows is accumulated, the duration is not (i.e. the 
durations printed are all for inserting 5000 rows). The numbers below 
are from a run where a commit is done every 5000 rows, which turned out 
to be lightly worse than every 50 rows (clocked in at 28m36s):
Connecting to the database
5000 rows inserted (23s)
10000 rows inserted (38s)
15000 rows inserted (54s)
20000 rows inserted (77s)
25000 rows inserted (93s)
30000 rows inserted (104s)
35000 rows inserted (127s)
40000 rows inserted (162s)
45000 rows inserted (186s)
50000 rows inserted (208s)
55000 rows inserted (234s)
60000 rows inserted (279s)

real    30m38.104s
user    0m20.898s
sys     0m4.452s

As can be seen, the time it takes to insert 5000 rows rises from 23 
seconds at startup to over nearly 300 seconds (5 minutes). This has to 
be investigated as well. I'll see if I can have a look soon, but anyone 
else is free to check it out. If I get around to it, I'll modify the 
script slightly and create a Jira.


Thanks Ture for reporting this.


regards,
-- 
Kristian


> 
> thanks,
> 
> bryan
> 


Mime
View raw message