db-ojb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Eason" <lance.ea...@whisperwire.com>
Subject Idea for further OJB performance improvements
Date Tue, 13 May 2003 18:01:04 GMT
Thomas, Armin, et al.,

I have another idea to bounce off you.  In my recent tuning exercise I got my biggest performance
improvement from actually taking management of certain relationships away from OJB which makes
me kind of sad.  Whereas the changes I posted to the list account for about a 20% improvement
this bought me a 600% improvement (from 24 seconds down to 4 seconds).  What I did is this...

Let's say we have three tables, PURCHASE_ORDER, LINE_ITEM and LINE_ITEM_NOTE, where LINE_ITEMs
are children of PURCHASE_ORDERs and LINE_ITEM_NOTEs are children of LINE_ITEMs.  Now say OJB
is defined to manage those relationships, a PurchaseOrder object has a collection of LineItem
objects and each LineItem has a collection of LineItemNote objects.  I do a query for PurchaseOrders
where customer_id = 20 and let's say I get back 10 PurchaseOrders each of which contains 5

The way OJB will handle this is:

   - Do a select 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20' to retrieve PurchaseOrders
   - For each of the 10 PurchaseOrders do an additional select 'SELECT * FROM LINE_ITEM WHERE
purchase_order_id = X' for a total of 10 more queries
   - For each of the 50 LineItems do an additional select 'SELECT * FROM LINE_ITEM_NOTE WHERE
line_item_id = X' for a total of 50 more queries

The point is that the number of queries rapidly explodes as relationships are followed and
each of those queries has overhead associated with it (getting a broker to use, creating the
statement, and the actual network hop and processing time of the query itself).  I can use
proxies to delay doing the queries until the information is actually asked for but if I really
do need all the information the net amount of work is the same.

What I did instead was:

   - Do a select to retrieve PurchaseOrders, 'SELECT * FROM PURCHASE_ORDER WHERE customer_id
= 20'
   - Do a single select to retrieve all associated line items 
       'SELECT li.* 
         WHERE li.purchase_order_id = po.purchase_order_id
           AND po.customer_id = 20'
   - Use the LineItem's foreign key back to the PurchaseOrder to attach it to the right PurchaseOrder
from the original select
   - Do a single select to retrieve all associated line item notes
       'SELECT lin.*
         WHERE lin.line_item_id = li.line_item_id
           AND li.purchase_order_id = po.purchase_order_id
           AND po.customer_id = 20'
   - Use the LineItemNote's foreign key back to the LineItem to attach it to the right LineItem

So the queries have gone down from 61 to 3 and the processing time has improved dramatically.
 This type of thing can be done in general where you use the same criteria that was used to
generate the top level query and just add the appropriate join information to get the data
for the related table you're interested in.

Of course given caching, doing this doesn't always make sense.  For instance if 9 of the 10
PurchaseOrders were already populated and in my cache then OJB's current strategy is likely
better than retrieving the LineItem and LineItemNotes for all 10 when all I need is the 1
uncached one.  The way I handled this was to define a threshold for the ratio between items
returned by the query and the one's not found in the cache.  If that ratio was high enough
I used this approach otherwise I fell back on OJB's approach (except that I took advantage
of the fact that I use single column integer primary keys and just generated an IN clause
with all the keys to still do it in a single query).

I'm pretty sure OJB has all the information it needs to do this same type of thing and I think
it would be great if I could just let OJB handle it for me rather than pulling responsibility
away from OJB.  Any thoughts? 

View raw message