db-ojb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Eason" <lance.ea...@whisperwire.com>
Subject RE: Idea for further OJB performance improvements
Date Wed, 14 May 2003 18:09:26 GMT
This is new to me.  Are there docs/examples?

-----Original Message-----
From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch]
Sent: Wednesday, May 14, 2003 1:07 PM
To: OJB Developers List
Subject: Re: Idea for further OJB performance improvements

hi lance,

to reduce the number of queries used for relationship retrieval you can 
already use prefetched queries.


Lance Eason wrote:

>Thomas, Armin, et al.,
>I have another idea to bounce off you.  In my recent tuning exercise I got my biggest
performance improvement from actually taking management of certain relationships away from
OJB which makes me kind of sad.  Whereas the changes I posted to the list account for about
a 20% improvement this bought me a 600% improvement (from 24 seconds down to 4 seconds). 
What I did is this...
>Let's say we have three tables, PURCHASE_ORDER, LINE_ITEM and LINE_ITEM_NOTE, where LINE_ITEMs
are children of PURCHASE_ORDERs and LINE_ITEM_NOTEs are children of LINE_ITEMs.  Now say OJB
is defined to manage those relationships, a PurchaseOrder object has a collection of LineItem
objects and each LineItem has a collection of LineItemNote objects.  I do a query for PurchaseOrders
where customer_id = 20 and let's say I get back 10 PurchaseOrders each of which contains 5
>The way OJB will handle this is:
>   - Do a select 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20' to retrieve PurchaseOrders
>   - For each of the 10 PurchaseOrders do an additional select 'SELECT * FROM LINE_ITEM
WHERE purchase_order_id = X' for a total of 10 more queries
>   - For each of the 50 LineItems do an additional select 'SELECT * FROM LINE_ITEM_NOTE
WHERE line_item_id = X' for a total of 50 more queries
>The point is that the number of queries rapidly explodes as relationships are followed
and each of those queries has overhead associated with it (getting a broker to use, creating
the statement, and the actual network hop and processing time of the query itself).  I can
use proxies to delay doing the queries until the information is actually asked for but if
I really do need all the information the net amount of work is the same.
>What I did instead was:
>   - Do a select to retrieve PurchaseOrders, 'SELECT * FROM PURCHASE_ORDER WHERE customer_id
= 20'
>   - Do a single select to retrieve all associated line items 
>       'SELECT li.* 
>         WHERE li.purchase_order_id = po.purchase_order_id
>           AND po.customer_id = 20'
>   - Use the LineItem's foreign key back to the PurchaseOrder to attach it to the right
PurchaseOrder from the original select
>   - Do a single select to retrieve all associated line item notes
>       'SELECT lin.*
>         WHERE lin.line_item_id = li.line_item_id
>           AND li.purchase_order_id = po.purchase_order_id
>           AND po.customer_id = 20'
>   - Use the LineItemNote's foreign key back to the LineItem to attach it to the right
>So the queries have gone down from 61 to 3 and the processing time has improved dramatically.
 This type of thing can be done in general where you use the same criteria that was used to
generate the top level query and just add the appropriate join information to get the data
for the related table you're interested in.
>Of course given caching, doing this doesn't always make sense.  For instance if 9 of the
10 PurchaseOrders were already populated and in my cache then OJB's current strategy is likely
better than retrieving the LineItem and LineItemNotes for all 10 when all I need is the 1
uncached one.  The way I handled this was to define a threshold for the ratio between items
returned by the query and the one's not found in the cache.  If that ratio was high enough
I used this approach otherwise I fell back on OJB's approach (except that I took advantage
of the fact that I use single column integer primary keys and just generated an IN clause
with all the keys to still do it in a single query).
>I'm pretty sure OJB has all the information it needs to do this same type of thing and
I think it would be great if I could just let OJB handle it for me rather than pulling responsibility
away from OJB.  Any thoughts? 
>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-dev-help@db.apache.org

To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-dev-help@db.apache.org

View raw message