From open-jpa-dev-return-4331-apmail-incubator-open-jpa-dev-archive=incubator.apache.org@incubator.apache.org Wed May 30 19:48:52 2007 Return-Path: Delivered-To: apmail-incubator-open-jpa-dev-archive@locus.apache.org Received: (qmail 58221 invoked from network); 30 May 2007 19:48:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 May 2007 19:48:51 -0000 Received: (qmail 62566 invoked by uid 500); 30 May 2007 19:48:55 -0000 Delivered-To: apmail-incubator-open-jpa-dev-archive@incubator.apache.org Received: (qmail 62535 invoked by uid 500); 30 May 2007 19:48:55 -0000 Mailing-List: contact open-jpa-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: open-jpa-dev@incubator.apache.org Delivered-To: mailing list open-jpa-dev@incubator.apache.org Received: (qmail 62526 invoked by uid 99); 30 May 2007 19:48:55 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2007 12:48:55 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of kwsutter@gmail.com designates 64.233.162.225 as permitted sender) Received: from [64.233.162.225] (HELO nz-out-0506.google.com) (64.233.162.225) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2007 12:48:50 -0700 Received: by nz-out-0506.google.com with SMTP id m7so1269227nzf for ; Wed, 30 May 2007 12:48:29 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=s34YXct6RmIsMgPha9mAQSjNDM4V/vZAIgnshpTEcyGWperIgGqPjS1DZM3w7e5VnOyr9MkQiIdX805Nd2+9BGHYfo3TfPyvG0MIWPJNAuhjP7fb1r0SCqlQj3EXlBN0e69X18X9M0+DOf704yKyxPw1ON4trLCpoi8UVUPHdPE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=RZZx6/ckWXDOTsr4R7H8SJUC9PObWCfWW125KLk7J7bEKOhDNMS2rquTvVFM6ecTBU7zWnJQQmM+0PqnDIqPXsoY+EVachg5AdMWhQD4yXSMOZpa5Ha/zaylrMrPbQfIgHzrd+zI+8fQZeWDoYO5Y1hNu0cwf9MQ5UnfhtnC3II= Received: by 10.114.174.2 with SMTP id w2mr4211209wae.1180554508616; Wed, 30 May 2007 12:48:28 -0700 (PDT) Received: by 10.114.52.10 with HTTP; Wed, 30 May 2007 12:48:28 -0700 (PDT) Message-ID: <89c0c52c0705301248o5be6bbf6x772311d9091d65a1@mail.gmail.com> Date: Wed, 30 May 2007 14:48:28 -0500 From: "Kevin Sutter" To: open-jpa-dev@incubator.apache.org, "Marc Prud'hommeaux" Subject: Re: missing getAll(List keys) method? In-Reply-To: <70d5df710705291433q5251ebc2we9f9b2bd3cbcf39d@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_9054_23609885.1180554508556" References: <70d5df710705231555m22fc0e32o13dbf3a982807143@mail.gmail.com> <619D1B66-FDCF-42AF-B881-C11A96086D49@SUN.com> <70d5df710705241159w5bd84660xaa1cb07b5eae120e@mail.gmail.com> <70d5df710705291433q5251ebc2we9f9b2bd3cbcf39d@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_9054_23609885.1180554508556 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Marc, What are your views on this request? Since you seem to be intimately familiar with the data cache API, do you see a problem with introducing this additional get method? Either from an expectation viewpoint or an implementation viewpoint? Thanks. Kevin On 5/29/07, Daniel Lee wrote: > > Hi Craig, > > The discussed API (getAll) is for fetching objects that's already cached > in > the DataCache. From what I understand, OpenJPA executes the following > code > when loading (find()) a customer which exists in the DataCache. It loads > not only the customer but also the objects in any eager (direct and > indirect) relationships. In the earlier example (a customer with 100 > orders > and each order has different products) the direct relationships > are all orders placed by the customer and the indirect relationships are > all > products in these orders). > > 1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to > initialize a new state manager for an object (a customer with 100 > orders > for example). > 2. initialize() then issues get() to DataCache to see whether the data > (customer) is already cached. After successfully getting the customer > (data > != null) from the datacache, DataCachePCData.load(sm, fetch, edata) is > invoked to load all the eager relationships (orders in the example) of > the > object (customer). > 3. PCDataImpl.load() loops through the relationship field to call > loadField() for each relationship which is not yet loaded. In this > example, > it is the relationship the customer to its orders (eager, > one-to-many) relationship > 4. loadField() calls toField() which is defined in AbstractPCData. > 5. toField() LOOPS through all elements (orders) to invoke > toNestedField() for each element. This is 100 toNestedFields calls for > the > 100 orders in the example. > 6. toNestedField() calls toRelationField(sm, vmd, data, fetch, > context) which actually calls find() and recursively get back to step 1 > above for loading "a" order. This will end up calling get() 100 times > to > the DataCache for the 100 orders and can possibly get into another loop > for > loading all products in each order, etc. > > Because of the loop in step 5 above, a single "find(customerA)" statement > actually triggers 100 DataCahce.get() for its orders and could be hundreds > or thousands more of the get() calls for the products ordered by the > customer. This is a performance hit as I understand. > > If we have getAll(List keys) method which returns a list of objects from > the > datacache, we can change the logic to call the following new methods to > get > all elements (orders/products) in one relationship in single call to > getAll(); instead of calling get() a hundred times for 100 orders. > > - toNestedFields() - called by toFields without the loop > - toRelationFields() - called by toNestedFields; calls findAll() > - findAll need to be able to initialize a List of sm and call > initializeAll() > - initializeAll() - call getAll() instead of get(), then iterate the > return to call load > > This is more like doing batch fetch from DataCache. There should be some > significant performance improvement, especially in the distributed > environment in which the communication/serialization area is known be the > bottleneck of the whole process. This implementation can also potentially > provide a lot better performance for the 3-rd party DataCache plug-ins > which > provide and optimize getAll() process. > > Hope this make the issue more clear this time. Could you please let me > know > if you have further questions or other concerns. Many thanks. > > Daniel > > On 5/24/07, Craig L Russell wrote: > > > Hi Daniel, > > > > On May 24, 2007, at 11:59 AM, Daniel Lee wrote: > > > > > Hi Craig, > > > > > > I think findAll() is different. It is a client level API and the > > > getAll() > > > here is for internal fetch from data cache. > > > > > > In the example, when an application issue findAll() for a list of > > > customers. It internally, for each customer with order(s), loads the > > > "eager" relationship (orders) from data cache if they are already > > > cached by > > > calling map.get(orderId) for each order placed by the customer. It > > > again > > > load the items that are related to each order by calling map.get > > > (itemId) for > > > each item if the relationship to Order is declared as eager. This is > > > potentially a performance bottleneck and findAll() does not avoid > > > this. > > > > Seems that this algorithm can be improved to use the broker's findAll > > mechanism when the instance is not found in the cache. The not-found > > instances can be found more efficiently than the code currently does. > > > > Craig > > > > > > Thanks. > > > Daniel > > > > > > > > > On 5/23/07, Craig L Russell wrote: > > >> > > >> Hi Daniel, > > >> > > >> Take a look at the findAll(Collection oids) method of > > >> OpenJPAEntityManager. This should do a better job than N get(Object > > >> key) methods. > > >> > > >> Craig > > >> > > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote: > > >> > > >> > Do we miss the getAll(List keys) method for data cache? > > >> > > > >> > When fetching objects with eager "to-many" relationships, the > > >> code is > > >> > calling get(Object key) multiple time (one for each object in the > > >> > relationship). For example, it is doing 1 get() call for each > > >> > order placed > > >> > by a customer which we are fetching, that means 100 calls for a > > >> > customer > > >> > with 100 orders. The performance can be greatly improved if we > > >> have > > >> > getAll(List keys) methods which returns all orders in one call. > > >> > This is > > >> > especially important in a distributed environment. > > >> > > > >> > Is there a way (new plug-in) to avoid the multiple-trip for single > > >> > relationship, or can we implement the code to improve the > > >> > performance in > > >> > this area? > > >> > > > >> > Many thanks. > > >> > Daniel > > >> > > >> Craig Russell > > >> Architect, Sun Java Enterprise System http://java.sun.com/products/ > > >> jdo > > >> 408 276-5638 mailto:Craig.Russell@sun.com > > >> P.S. A good JDO? O, Gasp! > > >> > > >> > > >> > > > > Craig Russell > > Architect, Sun Java Enterprise System http://java.sun.com/products/jdo > > 408 276-5638 mailto:Craig.Russell@sun.com > > P.S. A good JDO? O, Gasp! > > > > > > > ------=_Part_9054_23609885.1180554508556--