incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Helsen <>
Subject Re: TDB: release process
Date Tue, 10 Jan 2012 23:24:54 GMT

yes, I'll look into it as soon as I have cycles again. And no, I have not 
yet tried with non-transactional API in 2.7.0. I actually want to do that 
at some point to have a cleaner baseline. 

In the mean time, here is a summary of the results I found:

1) when I run with 1 client, query and store execution is comparable to 
each other. I have detailed numbers, but they help much
2) things become interesting when I start scaling up the number of clients 
(one of the principal motivations to move to TDB Tx). The data below is 
for the following scenario:

* 50 clients
* the operations of each client is a mixture of queries and write 
operations, where I execute a write operation for every 7th query
* the queries are deterministically taken from a pool of about 35 queries 
with varying complexity. When run in 1 client, they take anywhere from a 
few ms to almost 2 seconds for most intense query
* between each operation, I wait 2s
* there is plenty of memory/heap available. I use a 64 bit machine with 
8Gb of memory where 4 is used for the java heap.

Note that in TDB we use an exclusive write lock for write operations and 
shared read locks for read operations. In TDBTx, I just use transactions 
(i.e. we don't lock ourselves):

A) Here are the numbers for TDB (0.8.7 etc):

- total write time = 1345594ms, so about 1346s

                        cnt             |       avg             | max | 
min     |       dev             |       tot
DESCRIBE (ms)   402             |       466             |       4,859   | 
0       |       609             |       187,609 
SELECT   (ms)   4,618   |       4,809   |       93,453  |       0       | 
9,621   |       22,211,907 
PARALLELISM             5,020           |       14              |       41 
                |       0       |       8               |       79,066 

quite note about parallelism: this indicates effectively how much parallel 
activity was going on. For instance, on average, there were 14 queries 
running at the same time, but maximum 41. The total indicates how heavily 
query activity was running in parallel. 

B) Here are the numbers of TDBTx: 

- total write time = 166047ms, so about 166s

                        cnt             |       avg             | max | 
min     |       dev             |       tot
DESCRIBE (ms)   168             |       2,557   |       9,219   |       31 
        |       1,769   |       429,645 
SELECT   (ms)   1,853   |       38,866  |       392,282         |       0 
|       74,008  |       72,020,224 
PARALLELISM             2,021           |       35              |       49 
                |       0       |       10              |       71,791 

note that although the test suite are running in the same way, The long 
query times in TDBTx caused several timeouts, which indicates the 
substantially smaller amount of completed queries. Even so, the total 
query time was still almost 4 times higher

So, it seems that in this multi-client scenario, TDBTx is way better in 
avoiding lock contention around write operations, but, it is behaving 
significantly weaker for queries. One thing that is interesting is TDBTx 
has a higher number
of average parallel running queries and a higher max. So, perhaps this is 
an important cause in the slowdown. 

Hopefully these are useful. Does any of you have done any performance 
measurements with transactional TDB?


Andy Seaborne <>
01/10/2012 02:04 PM
Re: TDB: release process

On 10/01/12 13:45, Andy Seaborne wrote:
> On 09/01/12 15:07, Simon Helsen wrote:
>> Andy, others,
>> I have been testing TxTDB on my end and functionally, things are 
>> good. I am not able to see any immediate problems anymore. Of course,
>> there may still be more exotic things left, but those can probably
>> managed
>> in am minor release. However, now that it is getting good on the
>> functional end, I am starting to check the non-functional
>> characteristics,
>> especially speed and scalability (in terms of multiple clients). For 
>> I use a test suite with about 35 different queries and I compare the
>> performance against Jena 2.6.3/ARQ 2.8.5 and TDB 0.8.7 because that is
>> the
>> version we currently use in the release of our product.. I am comparing
>> these numbers then with Jena/ARQ 2.7.0 and TDB 0.9.0 (20111229) and the
>> transaction API. I realize this partially comparing apples to pears but
>> from our perspective, we need to see how the bottomline changes in 
>> of query speed when we increase the number of concurrent clients.
>> I have detailed numbers, but before I start sharing these, I want to 
>> if there is anything I could/should do to tune ARQ/TxTDB in terms of
>> performance. For instance, I wonder if there are still a whole range of
>> checks active which I can/should turn off now that we are functionally
>> more sound. For completeness, I should add that we don't use any
>> optimization (i.e. we run with none.opt )
>> thanks
>> Simon
> Simon,
> Figure would be good. If you use TDB without touching the transaction
> system then it should be the same as before (with the obvious chances of
> unintended changes). Have you run this way?
> Just creating a transaction, especially one that allows write is a cost
> and if the granularity is small then it's going to make a big
> difference. (This is one reason there isn't an "autocommit" mode - it
> only seems to end in trouble one way or another). Read transactions are
> cheaper but not free.
> In terms of tuning, TDB 0.9 needs more heap as the transaction
> intermediate state is in-RAM , with no proper spill-to-disk yet.
> There shouldn't be the internal consistency checking enabled. Hmm -
> better check yet again!
> Andy


Could you profile the tests and pass on the results?  Any testing code 
left should show as hotspots.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message