ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: IGNITE-4534 - Ignite 2.0 eviction design
Date Tue, 14 Feb 2017 21:59:53 GMT
Ivan, thanks for the detailed explanation. My preference would be to
support all of the suggested eviction policies and control them from the
configuration. However, it does sound the K-random-LRU or K-randome-LRU-2
will be much easier to support than LIRs.

Don't we already have something like K-random-LRU in place?

D.

On Tue, Feb 14, 2017 at 9:13 AM, Ivan Rakov <ivan.glukos@gmail.com> wrote:

> Hello Igniters,
>
> I want to share some thoughts about supporting eviction on new page memory
> storage.
>
> I think, we should reconsider eviction policies in their current state.
> Current pluggable eviction policy mechanism allows to determine which key
> will be evicted from the cache and when eviction should be performed. There
> are two issues about it:
>
> 1) Such mechanism tends to make off-heap storage highly fragmented: batch
> of small evicted entries can accumulate needed total amount of free space
> in FreeList, but it still would be impossible to store one big entry.
>
> 2) Growth of cache size implies creating O(n) heap objects and links to
> maintain eviction policy, which causes linear growth of GC load. That
> doesn't suit us: getting rid of GC overhead was one of the main reasons for
> keeping cache data in off-heap memory.
>
> I propose to use "whole page eviction, off-heap eviction metadata" as
> default mode for eviction. It's almost non-deterministic, because actual
> eviction result depends on how data is packed into pages, but is more
> efficient, and shouldn't cause long-term performance degradation due to
> fragmentation.
>
> I can suggest two ways to implement it:
>
> 1) *K-random-LRU
> *The idea is to allocate big off-heap array, to track timestamp of last
> usage for each data page.
> When data page on address X is accessed, we store current timestamp in X /
> PAGE_SIZE array position.
> When there's a need for eviction, we randomly choose K indices of array
> (if some of indices point to non-data pages, re-choose them) and evict data
> page with minimum timestamp. In case K=5, we'll evict page from 17% least
> recently used set with 50% probability <http://math.stackexchange.com
> /questions/786392/expectation-of-minimum-of-n-i-i-d-uniform-
> random-variables>.
> Non-data pages can be simply recognized by having zero in corresponding
> position of timestamp array.
> If entry is too big and occupies more than one page, only first page will
> be tracked, and other will be considered as non-data pages.
> *K-random-LRU-2* is perspective variant of this approach. We'll evict page
> with minimum time of penultimate access. Implementation is mostly the same,
> the difference is that we should store two timestamps for each data page.
> LRU-2 outperforms <http://www.cs.cmu.edu/%7Echri
> stos/courses/721-resources/p297-o_neil.pdf> LRU by resolving "one-hit
> wonder" problem: if page is accessed very rarely, but accidentally accessed
> once, it's "protected" from eviction for long time.
> Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap
> cache with standard 2KB pages requires 200MB of additional memory (400MB in
> case of LRU-2 strategy).*
> *
>
> 2)*Adaptation of CLOCK-Pro
> *CLOCK-Pro (article <https://www.usenix.org/legacy
> /event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>,
> presentation <http://www.slideshare.net/huliang64/clockpro>) is modern
> page replacement algorithm, used in actual OS kernels. It is an
> approximation of LIRS <http://web.cse.ohio-state.edu
> /hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> with significant
> advantage for our case: instead of doubly linked list, which is hard to
> maintain in off-heap memory, CLOCK-Pro uses only circular list and three
> pointers. However, we can't use CLOCK-Pro as is: there's difference between
> OS paged memory and Ignite off-heap storage. In OS, if memory page is
> swapped out, it still contains the same data, with same relevance and
> patterns of access. That's why keeping metadata for swapped-out page (see
> Non-resident Cold Pages set) works. In Ignite, in case we evict page, all
> data is discarded; entries with same keys may appear in the cache again,
> but they will be evenly distributed across the whole storage.
> I propose to use simplified version of CLOCK-Pro with only two pointers
> and two page sets: hot and cold. Non-resident pages won't be tracked.
> To implement it, we should allocate big off-heap array to track flags for
> each page. We should store (hot, cold) bit, (accessed, not accessed) bit
> and one bit indicating that the page is data page, totally 3 bits per page.
> 100GB off-heap cache with standard 2KB pages requires ~20MB of additional
> memory, or 50MB in case we don't want to shrink metadata for multiple pages
> in one byte.
> Memory overhead is really small, but we can't predict average and maximum
> time for one eviction, it has to be discovered during an experiment (in
> hypothetical worst case, clock hand can travel through the whole array).
> Also, hot/cold ratio can affect performance dramatically and should be
> tuned well.
>
> Note about classic "key" eviction: fair per-entry policies (FIFO, LRU,
> Sorted) may be useful for some users, and we may want to support them as
> well. It's no problem at all to implement them on-heap (more than, current
> implementations of EvictionPolicy will work in if we make minor changes in
> the code). In off-heap, FIFO is easy to implement (circular buffer), but
> LRU (and more effective LIRS, if someone wants it) require doubly linked
> lists. The only option I see is using our BPlusTree with some kind of
> implicit key, but this implies O(log n) overhead on each entry read access.
>
> Another note: current GridCacheEvictionManager have eviction backup
> synchronization mechanism. Do we want to support it for off-heap storage as
> well?
>
> Please let me know if my understanding of any algorithm or part of the
> system is wrong.
>
> --
> Best Regards,
> Ivan Rakov
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message