openjpa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Sutter" <kwsut...@gmail.com>
Subject Re: missing getAll(List keys) method?
Date Wed, 30 May 2007 19:48:28 GMT
Marc,
What are your views on this request?  Since you seem to be intimately
familiar with the data cache API, do you see a problem with introducing this
additional get method?  Either from an expectation viewpoint or an
implementation viewpoint?  Thanks.

Kevin

On 5/29/07, Daniel Lee <tsunfanglee@gmail.com> wrote:
>
> Hi Craig,
>
> The discussed API (getAll) is for fetching objects that's already cached
> in
> the DataCache.  From what I understand, OpenJPA executes the following
> code
> when loading (find()) a customer which exists in the DataCache.  It loads
> not only the customer but also the objects in any eager (direct and
> indirect) relationships.  In the earlier example (a customer with 100
> orders
> and each order has different products) the direct relationships
> are all orders placed by the customer and the indirect relationships are
> all
> products in these orders).
>
>    1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
>    initialize a new state manager for an object (a customer with 100
> orders
>    for example).
>    2. initialize() then issues get() to DataCache to see whether the data
>    (customer) is already cached.  After successfully getting the customer
> (data
>    != null) from the datacache, DataCachePCData.load(sm, fetch, edata) is
>    invoked to load all the eager relationships (orders in the example) of
> the
>    object (customer).
>    3. PCDataImpl.load() loops through the relationship field to call
>    loadField() for each relationship which is not yet loaded.  In this
> example,
>    it is the relationship the customer to its orders (eager,
>    one-to-many) relationship
>    4. loadField() calls toField() which is defined in AbstractPCData.
>    5. toField() LOOPS through all elements (orders) to invoke
>    toNestedField() for each element.  This is 100 toNestedFields calls for
> the
>    100 orders in the example.
>    6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
>    context) which actually calls find() and recursively get back to step 1
>    above for loading "a" order.  This will end up calling get() 100 times
> to
>    the DataCache for the 100 orders and can possibly get into another loop
> for
>    loading all products in each order, etc.
>
> Because of the loop in step 5 above, a single "find(customerA)" statement
> actually triggers 100 DataCahce.get() for its orders and could be hundreds
> or thousands more of the get() calls for the products ordered by the
> customer.  This is a performance hit as I understand.
>
> If we have getAll(List keys) method which returns a list of objects from
> the
> datacache, we can change the logic to call the following new methods to
> get
> all elements (orders/products) in one relationship in single call to
> getAll(); instead of calling get() a hundred times for 100 orders.
>
>    - toNestedFields() - called by toFields without the loop
>    - toRelationFields() - called by toNestedFields; calls findAll()
>    - findAll need to be able to initialize a List of sm and call
>    initializeAll()
>    - initializeAll() - call getAll() instead of get(), then iterate the
>    return to call load
>
> This is more like doing batch fetch from DataCache.  There should be some
> significant performance improvement, especially in the distributed
> environment in which the communication/serialization area is known be the
> bottleneck of the whole process.  This implementation can also potentially
> provide a lot better performance for the 3-rd party DataCache plug-ins
> which
> provide and optimize getAll() process.
>
> Hope this make the issue more clear this time.  Could you please let me
> know
> if you have further questions or other concerns.  Many thanks.
>
> Daniel
>
> On 5/24/07, Craig L Russell <Craig.Russell@sun.com> wrote:
>
> > Hi Daniel,
> >
> > On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
> >
> > > Hi Craig,
> > >
> > > I think findAll() is different.  It is a client level API and the
> > > getAll()
> > > here is for internal fetch from data cache.
> > >
> > > In the example, when an application issue findAll() for a list of
> > > customers.  It internally, for each customer with order(s), loads the
> > > "eager" relationship (orders) from data cache if they are already
> > > cached by
> > > calling map.get(orderId) for each order placed by the customer.  It
> > > again
> > > load the items that are related to each order by calling map.get
> > > (itemId) for
> > > each item if the relationship to Order is declared as eager.  This is
> > > potentially a performance bottleneck and findAll() does not avoid
> > > this.
> >
> > Seems that this algorithm can be improved to use the broker's findAll
> > mechanism when the instance is not found in the cache. The not-found
> > instances can be found more efficiently than the code currently does.
> >
> > Craig
> > >
> > > Thanks.
> > > Daniel
> > >
> > >
> > > On 5/23/07, Craig L Russell <Craig.Russell@sun.com> wrote:
> > >>
> > >> Hi Daniel,
> > >>
> > >> Take a look at the findAll(Collection oids) method of
> > >> OpenJPAEntityManager. This should do a better job than N get(Object
> > >> key) methods.
> > >>
> > >> Craig
> > >>
> > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> > >>
> > >> > Do we miss the getAll(List keys) method for data cache?
> > >> >
> > >> > When fetching objects with eager "to-many" relationships, the
> > >> code is
> > >> > calling get(Object key) multiple time (one for each object in the
> > >> > relationship).  For example, it is doing 1 get() call for each
> > >> > order placed
> > >> > by a customer which we are fetching, that means 100 calls for a
> > >> > customer
> > >> > with 100 orders.  The performance can be greatly improved if we
> > >> have
> > >> > getAll(List keys) methods which returns all orders in one call.
> > >> > This is
> > >> > especially important in a distributed environment.
> > >> >
> > >> > Is there a way (new plug-in) to avoid the multiple-trip for single
> > >> > relationship, or can we implement the code to improve the
> > >> > performance in
> > >> > this area?
> > >> >
> > >> > Many thanks.
> > >> > Daniel
> > >>
> > >> Craig Russell
> > >> Architect, Sun Java Enterprise System http://java.sun.com/products/
> > >> jdo
> > >> 408 276-5638 mailto:Craig.Russell@sun.com
> > >> P.S. A good JDO? O, Gasp!
> > >>
> > >>
> > >>
> >
> > Craig Russell
> > Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
> > 408 276-5638 mailto:Craig.Russell@sun.com
> > P.S. A good JDO? O, Gasp!
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message