openjpa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Lee" <tsunfang...@gmail.com>
Subject Re: missing getAll(List keys) method?
Date Thu, 31 May 2007 17:21:11 GMT
I can come up with a sample implementation and provide the result comparison
of the performance measurement.

Daniel

On 5/30/07, Marc Prud'hommeaux <mprudhom@bea.com> wrote:
>
>
> I personally think it sounds like a good idea that has a lot of
> potential for performance improvement.
>
> Perhaps someone could come up with a sample implementation that adds
> the API and a default implementation in the DataCacheImpl and compare
> the performance in the scenario mentioned below? That would help
> establish a concrete justification for enhancing the DataCache
> interface.
>
>
>
> On May 30, 2007, at 12:48 PM, Kevin Sutter wrote:
>
> > Marc,
> > What are your views on this request?  Since you seem to be
> > intimately familiar with the data cache API, do you see a problem
> > with introducing this additional get method?  Either from an
> > expectation viewpoint or an implementation viewpoint?  Thanks.
> >
> > Kevin
> >
> > On 5/29/07, Daniel Lee <tsunfanglee@gmail.com> wrote: Hi Craig,
> >
> > The discussed API (getAll) is for fetching objects that's already
> > cached in
> > the DataCache.  From what I understand, OpenJPA executes the
> > following code
> > when loading (find()) a customer which exists in the DataCache.  It
> > loads
> > not only the customer but also the objects in any eager (direct and
> > indirect) relationships.  In the earlier example (a customer with
> > 100 orders
> > and each order has different products) the direct relationships
> > are all orders placed by the customer and the indirect
> > relationships are all
> > products in these orders).
> >
> >    1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
> >    initialize a new state manager for an object (a customer with
> > 100 orders
> >    for example).
> >    2. initialize() then issues get() to DataCache to see whether
> > the data
> >    (customer) is already cached.  After successfully getting the
> > customer (data
> >    != null) from the datacache, DataCachePCData.load (sm, fetch,
> > edata) is
> >    invoked to load all the eager relationships (orders in the
> > example) of the
> >    object (customer).
> >    3. PCDataImpl.load() loops through the relationship field to call
> >    loadField() for each relationship which is not yet loaded.  In
> > this example,
> >    it is the relationship the customer to its orders (eager,
> >    one-to-many) relationship
> >    4. loadField() calls toField() which is defined in AbstractPCData.
> >    5. toField() LOOPS through all elements (orders) to invoke
> >    toNestedField() for each element.  This is 100 toNestedFields
> > calls for the
> >    100 orders in the example.
> >    6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
> >    context) which actually calls find() and recursively get back to
> > step 1
> >    above for loading "a" order.  This will end up calling get() 100
> > times to
> >    the DataCache for the 100 orders and can possibly get into
> > another loop for
> >    loading all products in each order, etc.
> >
> > Because of the loop in step 5 above, a single "find(customerA)"
> > statement
> > actually triggers 100 DataCahce.get() for its orders and could be
> > hundreds
> > or thousands more of the get() calls for the products ordered by the
> > customer.  This is a performance hit as I understand.
> >
> > If we have getAll(List keys) method which returns a list of objects
> > from the
> > datacache, we can change the logic to call the following new
> > methods to get
> > all elements (orders/products) in one relationship in single call to
> > getAll(); instead of calling get() a hundred times for 100 orders.
> >
> >    - toNestedFields() - called by toFields without the loop
> >    - toRelationFields() - called by toNestedFields; calls findAll()
> >    - findAll need to be able to initialize a List of sm and call
> >    initializeAll()
> >    - initializeAll() - call getAll() instead of get(), then iterate
> > the
> >    return to call load
> >
> > This is more like doing batch fetch from DataCache.  There should
> > be some
> > significant performance improvement, especially in the distributed
> > environment in which the communication/serialization area is known
> > be the
> > bottleneck of the whole process.  This implementation can also
> > potentially
> > provide a lot better performance for the 3-rd party DataCache plug-
> > ins which
> > provide and optimize getAll() process.
> >
> > Hope this make the issue more clear this time.  Could you please
> > let me know
> > if you have further questions or other concerns.  Many thanks.
> >
> > Daniel
> >
> > On 5/24/07, Craig L Russell <Craig.Russell@sun.com> wrote:
> >
> > > Hi Daniel,
> > >
> > > On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
> > >
> > > > Hi Craig,
> > > >
> > > > I think findAll() is different.  It is a client level API and the
> > > > getAll()
> > > > here is for internal fetch from data cache.
> > > >
> > > > In the example, when an application issue findAll() for a list of
> > > > customers.  It internally, for each customer with order(s),
> > loads the
> > > > "eager" relationship (orders) from data cache if they are already
> > > > cached by
> > > > calling map.get (orderId) for each order placed by the
> > customer.  It
> > > > again
> > > > load the items that are related to each order by calling map.get
> > > > (itemId) for
> > > > each item if the relationship to Order is declared as eager.
> > This is
> > > > potentially a performance bottleneck and findAll() does not avoid
> > > > this.
> > >
> > > Seems that this algorithm can be improved to use the broker's
> > findAll
> > > mechanism when the instance is not found in the cache. The not-found
> > > instances can be found more efficiently than the code currently
> > does.
> > >
> > > Craig
> > > >
> > > > Thanks.
> > > > Daniel
> > > >
> > > >
> > > > On 5/23/07, Craig L Russell < Craig.Russell@sun.com> wrote:
> > > >>
> > > >> Hi Daniel,
> > > >>
> > > >> Take a look at the findAll(Collection oids) method of
> > > >> OpenJPAEntityManager. This should do a better job than N get
> > (Object
> > > >> key) methods.
> > > >>
> > > >> Craig
> > > >>
> > > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> > > >>
> > > >> > Do we miss the getAll(List keys) method for data cache?
> > > >> >
> > > >> > When fetching objects with eager "to-many" relationships, the
> > > >> code is
> > > >> > calling get(Object key) multiple time (one for each object
> > in the
> > > >> > relationship).  For example, it is doing 1 get() call for each
> > > >> > order placed
> > > >> > by a customer which we are fetching, that means 100 calls for
a
> > > >> > customer
> > > >> > with 100 orders.  The performance can be greatly improved if
we
> > > >> have
> > > >> > getAll(List keys) methods which returns all orders in one call.
> > > >> > This is
> > > >> > especially important in a distributed environment.
> > > >> >
> > > >> > Is there a way (new plug-in) to avoid the multiple-trip for
> > single
> > > >> > relationship, or can we implement the code to improve the
> > > >> > performance in
> > > >> > this area?
> > > >> >
> > > >> > Many thanks.
> > > >> > Daniel
> > > >>
> > > >> Craig Russell
> > > >> Architect, Sun Java Enterprise System http://java.sun.com/
> > products/
> > > >> jdo
> > > >> 408 276-5638 mailto:Craig.Russell@sun.com
> > > >> P.S. A good JDO? O, Gasp!
> > > >>
> > > >>
> > > >>
> > >
> > > Craig Russell
> > > Architect, Sun Java Enterprise System http://java.sun.com/
> > products/jdo
> > > 408 276-5638 mailto: Craig.Russell@sun.com
> > > P.S. A good JDO? O, Gasp!
> > >
> > >
> > >
> >
>
> --
> Marc Prud'hommeaux
> BEA Systems, Inc.
>
>
>
> Notice:  This email message, together with any attachments, may contain
> information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
> entities,  that may be confidential,  proprietary,  copyrighted  and/or
> legally privileged, and is intended solely for the use of the individual or
> entity named in this message. If you are not the intended recipient, and
> have received this message in error, please immediately return this by email
> and then delete it.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message