ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject IGNITE-4534 - Ignite 2.0 eviction design
Date Tue, 14 Feb 2017 17:13:21 GMT
Hello Igniters,

I want to share some thoughts about supporting eviction on new page 
memory storage.

I think, we should reconsider eviction policies in their current state. 
Current pluggable eviction policy mechanism allows to determine which 
key will be evicted from the cache and when eviction should be 
performed. There are two issues about it:

1) Such mechanism tends to make off-heap storage highly fragmented: 
batch of small evicted entries can accumulate needed total amount of 
free space in FreeList, but it still would be impossible to store one 
big entry.

2) Growth of cache size implies creating O(n) heap objects and links to 
maintain eviction policy, which causes linear growth of GC load. That 
doesn't suit us: getting rid of GC overhead was one of the main reasons 
for keeping cache data in off-heap memory.

I propose to use "whole page eviction, off-heap eviction metadata" as 
default mode for eviction. It's almost non-deterministic, because actual 
eviction result depends on how data is packed into pages, but is more 
efficient, and shouldn't cause long-term performance degradation due to 
fragmentation.

I can suggest two ways to implement it:

1) *K-random-LRU
*The idea is to allocate big off-heap array, to track timestamp of last 
usage for each data page.
When data page on address X is accessed, we store current timestamp in X 
/ PAGE_SIZE array position.
When there's a need for eviction, we randomly choose K indices of array 
(if some of indices point to non-data pages, re-choose them) and evict 
data page with minimum timestamp. In case K=5, we'll evict page from 17% 
least recently used set with 50% probability 
<http://math.stackexchange.com/questions/786392/expectation-of-minimum-of-n-i-i-d-uniform-random-variables>.
Non-data pages can be simply recognized by having zero in corresponding 
position of timestamp array.
If entry is too big and occupies more than one page, only first page 
will be tracked, and other will be considered as non-data pages.
*K-random-LRU-2* is perspective variant of this approach. We'll evict 
page with minimum time of penultimate access. Implementation is mostly 
the same, the difference is that we should store two timestamps for each 
data page. LRU-2 outperforms 
<http://www.cs.cmu.edu/%7Echristos/courses/721-resources/p297-o_neil.pdf> 
LRU by resolving "one-hit wonder" problem: if page is accessed very 
rarely, but accidentally accessed once, it's "protected" from eviction 
for long time.
Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap 
cache with standard 2KB pages requires 200MB of additional memory (400MB 
in case of LRU-2 strategy).*
*

2)*Adaptation of CLOCK-Pro
*CLOCK-Pro (article 
<https://www.usenix.org/legacy/event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>,

presentation <http://www.slideshare.net/huliang64/clockpro>) is modern 
page replacement algorithm, used in actual OS kernels. It is an 
approximation of LIRS 
<http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> 
with significant advantage for our case: instead of doubly linked list, 
which is hard to maintain in off-heap memory, CLOCK-Pro uses only 
circular list and three pointers. However, we can't use CLOCK-Pro as is: 
there's difference between OS paged memory and Ignite off-heap storage. 
In OS, if memory page is swapped out, it still contains the same data, 
with same relevance and patterns of access. That's why keeping metadata 
for swapped-out page (see Non-resident Cold Pages set) works. In Ignite, 
in case we evict page, all data is discarded; entries with same keys may 
appear in the cache again, but they will be evenly distributed across 
the whole storage.
I propose to use simplified version of CLOCK-Pro with only two pointers 
and two page sets: hot and cold. Non-resident pages won't be tracked.
To implement it, we should allocate big off-heap array to track flags 
for each page. We should store (hot, cold) bit, (accessed, not accessed) 
bit and one bit indicating that the page is data page, totally 3 bits 
per page. 100GB off-heap cache with standard 2KB pages requires ~20MB of 
additional memory, or 50MB in case we don't want to shrink metadata for 
multiple pages in one byte.
Memory overhead is really small, but we can't predict average and 
maximum time for one eviction, it has to be discovered during an 
experiment (in hypothetical worst case, clock hand can travel through 
the whole array). Also, hot/cold ratio can affect performance 
dramatically and should be tuned well.

Note about classic "key" eviction: fair per-entry policies (FIFO, LRU, 
Sorted) may be useful for some users, and we may want to support them as 
well. It's no problem at all to implement them on-heap (more than, 
current implementations of EvictionPolicy will work in if we make minor 
changes in the code). In off-heap, FIFO is easy to implement (circular 
buffer), but LRU (and more effective LIRS, if someone wants it) require 
doubly linked lists. The only option I see is using our BPlusTree with 
some kind of implicit key, but this implies O(log n) overhead on each 
entry read access.

Another note: current GridCacheEvictionManager have eviction backup 
synchronization mechanism. Do we want to support it for off-heap storage 
as well?

Please let me know if my understanding of any algorithm or part of the 
system is wrong.

--
Best Regards,
Ivan Rakov


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message