db-ojb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Braeuchi <jbraeu...@gmx.ch>
Subject Re: Idea for further OJB performance improvements
Date Wed, 14 May 2003 18:30:13 GMT
btw could you please send me the SqlHelper stuff ?

Lance Eason wrote:

>Yeah I just found it.  Cool stuff.  There is an implicit assumption that primary keys
are single column for this to work, but that's how I model them anyway.
>
>-----Original Message-----
>From: Lance Eason 
>Sent: Wednesday, May 14, 2003 1:09 PM
>To: OJB Developers List
>Subject: RE: Idea for further OJB performance improvements
>
>
>This is new to me.  Are there docs/examples?
>
>-----Original Message-----
>From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch]
>Sent: Wednesday, May 14, 2003 1:07 PM
>To: OJB Developers List
>Subject: Re: Idea for further OJB performance improvements
>
>
>hi lance,
>
>to reduce the number of queries used for relationship retrieval you can 
>already use prefetched queries.
>
>hth
>jakob
>
>Lance Eason wrote:
>
>  
>
>>Thomas, Armin, et al.,
>>
>>I have another idea to bounce off you.  In my recent tuning exercise I got my biggest
performance improvement from actually taking management of certain relationships away from
OJB which makes me kind of sad.  Whereas the changes I posted to the list account for about
a 20% improvement this bought me a 600% improvement (from 24 seconds down to 4 seconds). 
What I did is this...
>>
>>Let's say we have three tables, PURCHASE_ORDER, LINE_ITEM and LINE_ITEM_NOTE, where
LINE_ITEMs are children of PURCHASE_ORDERs and LINE_ITEM_NOTEs are children of LINE_ITEMs.
 Now say OJB is defined to manage those relationships, a PurchaseOrder object has a collection
of LineItem objects and each LineItem has a collection of LineItemNote objects.  I do a query
for PurchaseOrders where customer_id = 20 and let's say I get back 10 PurchaseOrders each
of which contains 5 LineItems.
>>
>>The way OJB will handle this is:
>>
>>  - Do a select 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20' to retrieve
PurchaseOrders
>>  - For each of the 10 PurchaseOrders do an additional select 'SELECT * FROM LINE_ITEM
WHERE purchase_order_id = X' for a total of 10 more queries
>>  - For each of the 50 LineItems do an additional select 'SELECT * FROM LINE_ITEM_NOTE
WHERE line_item_id = X' for a total of 50 more queries
>>
>>The point is that the number of queries rapidly explodes as relationships are followed
and each of those queries has overhead associated with it (getting a broker to use, creating
the statement, and the actual network hop and processing time of the query itself).  I can
use proxies to delay doing the queries until the information is actually asked for but if
I really do need all the information the net amount of work is the same.
>>
>>What I did instead was:
>>
>>  - Do a select to retrieve PurchaseOrders, 'SELECT * FROM PURCHASE_ORDER WHERE customer_id
= 20'
>>  - Do a single select to retrieve all associated line items 
>>      'SELECT li.* 
>>         FROM PURCHASE_ORDER po, LINE_ITEM li
>>        WHERE li.purchase_order_id = po.purchase_order_id
>>          AND po.customer_id = 20'
>>  - Use the LineItem's foreign key back to the PurchaseOrder to attach it to the right
PurchaseOrder from the original select
>>  - Do a single select to retrieve all associated line item notes
>>      'SELECT lin.*
>>         FROM PURCHASE_ORDER po, LINE_ITEM li, LINE_ITEM_NOTE lin
>>        WHERE lin.line_item_id = li.line_item_id
>>          AND li.purchase_order_id = po.purchase_order_id
>>          AND po.customer_id = 20'
>>  - Use the LineItemNote's foreign key back to the LineItem to attach it to the right
LineItem
>>
>>So the queries have gone down from 61 to 3 and the processing time has improved dramatically.
 This type of thing can be done in general where you use the same criteria that was used to
generate the top level query and just add the appropriate join information to get the data
for the related table you're interested in.
>>
>>Of course given caching, doing this doesn't always make sense.  For instance if 9
of the 10 PurchaseOrders were already populated and in my cache then OJB's current strategy
is likely better than retrieving the LineItem and LineItemNotes for all 10 when all I need
is the 1 uncached one.  The way I handled this was to define a threshold for the ratio between
items returned by the query and the one's not found in the cache.  If that ratio was high
enough I used this approach otherwise I fell back on OJB's approach (except that I took advantage
of the fact that I use single column integer primary keys and just generated an IN clause
with all the keys to still do it in a single query).
>>
>>I'm pretty sure OJB has all the information it needs to do this same type of thing
and I think it would be great if I could just let OJB handle it for me rather than pulling
responsibility away from OJB.  Any thoughts? 
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>>For additional commands, e-mail: ojb-dev-help@db.apache.org
>>
>>
>> 
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-dev-help@db.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-dev-help@db.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-dev-help@db.apache.org
>
>
>  
>


Mime
View raw message