db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olav Sandstaa <o...@sun.com>
Subject Re: Blog response to Oracle white paper
Date Thu, 30 Nov 2006 23:14:55 GMT
Great blog contribution, David!  I also find the report highly biased 
and agree with the other comments
regarding the presented results and the conclusion that is made. But
even if comparing Derby to BDB Java edition is comparing apples to
oranges, still I think some of the results are worth to look a bit

For instance, the first reference to this report was sent on a thread
on derby-user where the topic of the thread was someone who wanted to
use the Derby store directly to avoid using "SQL/JDBC because it has
unacceptable overhead" and where the main access functions were
put/get of key based data [1]. As response he got advices ranging from
"use BDB Java edition" to "try it with Derby to determine if the
overhead of using JDBC/SQL indeed is too high".

Several places in the BDB paper the same conclusion is reached that
the lower performance of Derby compared to BDB is "probably due to
the overhead of SQL processing in Derby". The experiment that is most
similar to the "get record based on key" as requested in [1] is shown
in the figure named "Random read". In this figure it seems like BDB has 5X
the performance of Derby (100.000 vs 20.000 records per second).

I think it would be useful if we understood the main reasons for why
BDB is able to retrieve five times more tuples than Derby. I find it
hard to believe that the "overhead of SQL processing" given that
prepared statements are used should add 400% to the cost of
"retrieving a record from a b-tree". I do not know BDB Java edition
well, but as far as I understand it is running with transactions and
transaction isolation so that should be a cost to both BDB and
Derby. I have not seen the code for the test application(s) so it is a
bit hard to know what they actually measure, but the 20.000
records/second for Derby seems similar to the number of single-record
select queries I get when running on comparable hardware.

Are there other necessary functionality present in Derby (and missing
in BDB) that is adding to the cost besides the JDBC and SQL layer? Or
is most of these 400% extra cost mostly contributed by "overhead of
SQL processing"? It would be great to get opinions from other on the
list on both if this is a reasonable overhead compared to BDB and what
are the causes for this overhead, and whether this is something that
we should try to improve. It could of course also be that BDB
is doing something either "very smart" or that the test that has been
run is not fair in some way.

[1] http://www.nabble.com/Using-Derby-as-a-binary-store-tf2696662.html

View raw message