db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: Inserting data into a database on a Derby network server
Date Tue, 09 Oct 2007 12:59:12 GMT
John Embretsen wrote:
> Kristian Waagan wrote:
>> John Embretsen wrote:
>>> Hi Ture, 
>> [ snip ]
>>> I didn't think it was this easy to fill up the heap (by not 
>>> explicitly closing Statement objects) anymore (see e.g. DERBY-210), 
>>> but there are obviously some vulnerabilities left. So thank you for 
>>> sharing your code and for reporting this!
>> Hi John,
>> I think the reason why the OOME happened so fast in this application 
>> was because of the sorting (ORDER BY). I don't know enough about how 
>> sorting is implemented in Derby to have an opinion, but I think we 
>> should have a look at it.
>> I will do a little pre-investigation and create a Jira if appropriate.
> OK, thanks! From quickly reading the thread I was under the impression 
> that adding a call to Statement.close() solved the problem, but I guess 
> that was only part of it.
> One of the DOTS tests I've been running also stumbled across a memory 
> leak related to sorting, see 
> http://issues.apache.org/jira/browse/DERBY-2176 . Interesting 
> coincidence. That was fixed with, though, so this must be 
> something else.

Hi again,

After a quick look at the repro, I have concluded that Derby does not 
have a bug in this area.
That said, it might be possible to code very defensively and try to make 
up for the lack of explicit closing of resources (statements and 
resultsets). I believe this works for the embedded case, due to garbage 
collection. For this to work in c/s, the server must somehow be notified 
that the objects on the client side has been gc'ed.

Regarding the repro, only one change is needed to make it complete the 
loading; close the resultset 'rs' inside the loop. It might or might not 
work when you remove the reconnection-logic, depending on how much 
resources are tied to the statement-object on the server side and how 
big the heap is.
The sort-resources are tied to the resultset.
Even better, close the statement or use a prepared statement.

Regarding the reduced insertion performance over time, this was simply 
caused by a missing index. By adding an index for the column 'name' in 
Quaternary, the times look like this on my computer:
Connecting to the database
5000 rows inserted (8s)
10000 rows inserted (5s)
15000 rows inserted (5s)
20000 rows inserted (5s)
25000 rows inserted (5s)
30000 rows inserted (5s)
35000 rows inserted (5s)
40000 rows inserted (5s)
45000 rows inserted (5s)
50000 rows inserted (5s)
55000 rows inserted (5s)
60000 rows inserted (5s)

real    1m11.429s
user    0m13.909s
sys     0m2.606s

Now that's a nice performance increase :) (from ~30 to ~1 minute).
And you pay for it by increased storage cost (for the index). With the 
index the database doesn't have to scan the whole table on each insert 
to answer the 'SELECT id FROM Quaternary WHERE name=?'-query.



View raw message