db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Thalamati (JIRA)" <derby-...@db.apache.org>
Subject [jira] Created: (DERBY-524) weigth based page cache might improving derby throughput by keeping more heavily used pages in the page cache
Date Fri, 19 Aug 2005 22:43:54 GMT
weigth based  page cache  might improving derby throughput by keeping more heavily used pages
in the  page cache
----------------------------------------------------------------------------------------------------------------

         Key: DERBY-524
         URL: http://issues.apache.org/jira/browse/DERBY-524
     Project: Derby
        Type: New Feature
  Components: Services  
    Versions: 10.1.1.1    
    Reporter: Suresh Thalamati


This issue was discussed on  the derby-dev list  along with online backup (derby-239) design
, because online backup will read pages into the cache  and potentially replace active user
pages in the cache. 

comments  from the list  related to this:
http://mail-archives.apache.org/mod_mbox/db-derby-dev/200507.mbox/%3c42E50861.8020504@sbcglobal.net%3e
Mike Wrote ....
I also agree that page cache enhancement is interesting, but probably
should be tackled as a separate project.  But keeping this goal in mind
while making changes for backup is a good thing.  An interface that
that allows backup to use/reuse a single buffer in the page cache seems
reasonable.  Specializing it would seem to allow some optimizations where free page searching
could be avoided for this operation which at
a very low level is going to be pushing/pulling pages as fast as possible.

I have seen the following ideas work well in a weight based page cache, it tries to limit
the overhead of weights by using multiple lru, but still have some of the benefit of weight
based scheme:
1) have a much smaller range than 0-100, something like 5 where each
   value is it's own lru queue.  This reduces the overhead of searching
   and sorting based on weight.
2) as dan suggests, something like:
   no weight: free list
   0: backup page, linear scan heap pages, read ahead,
   1: probe accessed heap page
   2: leaf page
   3: non-leaf page
   4: root
3) to account for re-reference, pages move up in value when re-referenced.  Revalue happens
only when page is accessed so
page is already latched, so limits additional overhead needed
to reweigh page.
 various methods can be used for moving down in value:
    o whole queues at a time
    o individual pages in lru order, based on some sort of clock like current clock



Øystein Grøvlen wrote:

>>>>>> "DJD" == Daniel John Debrunner <djd@debrunners.com> writes:
>
>
>
>     DJD> I think modifications to the cache would be useful for b), so
>     DJD> that entries in the cache (through generic apis, not specific
>     DJD> to store) could mark how "useful/valuable" they are. Just a
>     DJD> simple scheme, lower numbers less valuable, higher numbers
>     DJD> more valuable, and if it makes it easier to fix a range,
>     DJD> e.g. 0-100, then that would be ok. Then the store could added
>     DJD> pages to the cache with this weighting, e.g. (to get the
>     DJD> general idea)
>
>     DJD>      pages for backup - weight 0
>     DJD>      overflow column pages - weight 10
>     DJD>      regular pages - weight 20
>     DJD>      leaf index pages - weight 30
>     DJD>       root index pages 80
>
>     DJD> This weight would then be factored into the decision to throw pages out
>     DJD> or not.
>
> I agree that we need some mechanism to prevent operations from filling
> the cache with pages that is not likely to be accesssed again in the
> near future.  However, I am afraid that a very detailed "cost-based"
> scheme may create a significant overhead compared to a simple LRU
> scheme.
>
> One may operate with separate LRU queues for different weights, but I
> guess the number of possible weights will have to be restricted in
> that case.
>
> I am also not convinced that it is the type of page that is the most
> important criteria for caching.  What matters is access frequency.
> The page type may give a hint, but leaf pages of one index may be more
> frequently accessed than root pages of other indexes.
>
> The type of access is also a relevant criteria.  Pages accessed
> sequentially is often less likely to be accessed again in the near
> future than pages accessed by direct lookup.  A separate LRU queue for
> sequentially accessed pages may prevent backup and other sequentially
> scans (e.g., select * from t) from throwing out directly accessed
> pages (e.g., index pages and data pages accessed through indexes.)
>
>     DJD> This project could be independent of the online backup and could have
>     DJD> benfits elsewhere.
>
> I agree.
>
>



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message