openjpa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pinaki Poddar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (OPENJPA-703) Cache ResultObjectProvider data to improve query performance
Date Sun, 31 Aug 2008 22:27:44 GMT

    [ https://issues.apache.org/jira/browse/OPENJPA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627354#action_12627354
] 

Pinaki Poddar commented on OPENJPA-703:
---------------------------------------

> I just don't like seeing the SQL generation process taking over 30% of my program's execution
time when I simply execute the SAME QUERY OBJECT over and over again. 
Agreed. The issue with the proposed solution, however, is that the query instances are bound
to a persistence context (aka. EntityManager) -- unless one is reusing the same EntityManager
- an unlikely scenario in JEE container environment -- associating generated SQL with persistence
context is not going to be fruitful as the context will live within a single transaction and
then disappear.

I have now introduced a mechanism similar to your proposed idea to cache the SQLs generated
by OpenJPA. The cached SQL is  indexed by the original JPQL or NamedQuery moniker at EntityManagerFactory
level. 

The advantages of the approach are manifold:
   1. this caching mechanism caches *any* query. Not only findBy() as is the case with an
earlier attempt to SQL level query caching. 
   2. it provides a longer life span of the cache and hence same JPQL query Q when executed
by different EntityManagers can be now reused efficiently. 
   3.  the key for the query is natural and indirectly specified by the user to be unique
as the key is the JPQL itself or the NamedQuery moniker. So there is neither any overhead
of computing the query key nor there is any code coupling as context of what makes a query
unique grows.
   4. the mechanism is orthogonal and unobtrusive to OpenJPA kernel as well as the user. The
JPQL queries during first execution gets translated to SQL and any subsequent execution of
the query within the same persistent unit (not context) reuse the generated SQL directly,
bypassing the entire query formation (that 30% you mentioned) logic of the kernel. 
   5. The critical (and slightly hairy) issue is the parameter binding. As JPQL allows positional
as well as named parameter binding but SQL allows only positional binding, hence a bit of
care must be exercised to ensure that parameter binding remains consistent as we translate
a JPQL to SQL under the hood.

Initial performance result with this simple mechanics is promising and presented below:
  1. All measurements are taken on my laptop 
  2. Logging was turned off
  3. The database was empty -- so the query gets executed but no actual record is selected.
This is done purposefully to accentuate the incremental benefit of query construction vs query
caching. 
  4. All measurements are taken by 100 repeated executions of each query 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Query                                                                             
                                                                            Time taken  in
ms                                          %Improvement
                                                                                         
                                                                        without cache    
 with cache
                                                                                         
                                                                                    t1   
              t2                                               PCT[(t1-t2)/t1]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. select p from PObject p                                                               
                                                                625              578     
                                        +7%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 

2. select p from PObject p where p.name = 'PObject'                                      
                                         562              516                            
                  +8%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name = 'PObject')

3. select p from PObject p where p.name = :param                                         
                                         531               500                           
                  +6%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name = ?)

4. select e from Employee e where e.name = :emp and e.department.name = :dept            
            594             546                                               +8%
                and e.department.company.name = :company and e.address.zip = :zip
    SELECT t0.id, t3.id, t3.city, t3.state, t3.street, t3.zip, t1.id, t2.id, t2.name, t1.name,
t0.name 
        FROM Employee t0 INNER JOIN Department t1 ON t0.DEPARTMENT_ID = t1.id 
        INNER JOIN Address t3 ON t0.ADDRESS_ID = t3.id 
        INNER JOIN Company t2 ON t1.COMPANY_ID = t2.id 
        WHERE (t0.name = ? AND t1.name = ? AND t2.name = ? AND t3.zip = ?)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The current limitation is:
  a) queries that result into more than one SQL statements (e.g. union or parallel queries)
are not amenable to caching.
  b) no cache invalidation on fetch plan changes -- but the existing caching does not address
that issue either. 



   An earlier attempt to caching SQL query (as configured by jdbc.QuerySQLCache) is based
on associating a SQL string to the implementation-internal Select objects and caching the
Select objects themselves. This current work does overlap with this effort and hence some
explanation is called for on why I have considered an alternative implementation for SQL query
caching while one already exists. My primary concern with existing implementation for query
caching are as follows:
       a) it is non-orthogonal to other design constructs. For example, when a relation is
being loaded (a very critical and common code logic) the code becomes aware whether query
caching is in effect and, more critically, takes a distinctly different path.
       b) the code logic of JDBCStoreManager -- another critical component of OpenJPA -- also
becomes aware of query caching and starts bifurcating code paths. This approach can lead to
poor maintainability and consistency in future.
       c) it only works for findBy() and trying to generalize it for other queries will spread
the code to many other mapping strategies -- accentuating the above mentioned design concerns.
       d) it does need to compute a Select key and also caches a Select -- a very heavy object
-- causing a poor memory usage profile. 

   I did not find any test case that measure performance gain of this existing approach. Can
anyone (whoever has cared to read up to this point:) please point me to any performance result
obtained with existing caching implementation so that I can do a fair comparative performance
evaluation of the existing approach and this current one?

 

> Cache ResultObjectProvider data to improve query performance
> ------------------------------------------------------------
>
>                 Key: OPENJPA-703
>                 URL: https://issues.apache.org/jira/browse/OPENJPA-703
>             Project: OpenJPA
>          Issue Type: Improvement
>          Components: kernel
>            Reporter: Ron Pressler
>
> Profiling indicated that JDBCStoreQuery.populateSelect consumes a significant amount
of CPU, and is executed every time a query is run. While, in fact, the actual PreparedStatement
is created and run only in QueryImpl.toResult. It seems like the returned ResultObjectProvider
from JDBCStoreQuery.executeQuery can be at least partially cached, or even cached in its entirety
(provided care is taken with the context parameters). 
> It seems like such an improvement would significantly improve query performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message