ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: IGNITE-4534 - Ignite 2.0 eviction design
Date Wed, 15 Feb 2017 19:13:42 GMT
On Wed, Feb 15, 2017 at 9:06 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Dmitriy,
>
> For the current off-heap approach, we only have a per-entry LRU eviction
> policy which does not fit well page memory architecture. I like the
> adaptation of clock pro algorithm because it requires a significantly
> smaller amount of memory to track page updates and is scan-resistant.
>

Agree.


>
> I want to back Ivan here and discuss whether we need synchronized evictions
> at all. Given that current implementation has flaws and correct
> implementation should work with the same protocol guarantees as a cache
> update (meaning two-phase commit for transactional caches on every
> operation, including reads). I would rather drop synchronous evictions at
> all.
>
> Thoughts?
>

Agree again. Synchronous evictions only make sense in pure in-memory
approach, where there is no database at all. AFAIK, most of the Ignite
deployments are backed by a database. If not, then users should control the
evictions themselves.

However, I am always concerned when removing a certain feature. If we start
getting complaints, we may have to add it in 2.0.


>
> 2017-02-15 0:59 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
>
> > Ivan, thanks for the detailed explanation. My preference would be to
> > support all of the suggested eviction policies and control them from the
> > configuration. However, it does sound the K-random-LRU or K-randome-LRU-2
> > will be much easier to support than LIRs.
> >
> > Don't we already have something like K-random-LRU in place?
> >
> > D.
> >
> > On Tue, Feb 14, 2017 at 9:13 AM, Ivan Rakov <ivan.glukos@gmail.com>
> wrote:
> >
> > > Hello Igniters,
> > >
> > > I want to share some thoughts about supporting eviction on new page
> > memory
> > > storage.
> > >
> > > I think, we should reconsider eviction policies in their current state.
> > > Current pluggable eviction policy mechanism allows to determine which
> key
> > > will be evicted from the cache and when eviction should be performed.
> > There
> > > are two issues about it:
> > >
> > > 1) Such mechanism tends to make off-heap storage highly fragmented:
> batch
> > > of small evicted entries can accumulate needed total amount of free
> space
> > > in FreeList, but it still would be impossible to store one big entry.
> > >
> > > 2) Growth of cache size implies creating O(n) heap objects and links to
> > > maintain eviction policy, which causes linear growth of GC load. That
> > > doesn't suit us: getting rid of GC overhead was one of the main reasons
> > for
> > > keeping cache data in off-heap memory.
> > >
> > > I propose to use "whole page eviction, off-heap eviction metadata" as
> > > default mode for eviction. It's almost non-deterministic, because
> actual
> > > eviction result depends on how data is packed into pages, but is more
> > > efficient, and shouldn't cause long-term performance degradation due to
> > > fragmentation.
> > >
> > > I can suggest two ways to implement it:
> > >
> > > 1) *K-random-LRU
> > > *The idea is to allocate big off-heap array, to track timestamp of last
> > > usage for each data page.
> > > When data page on address X is accessed, we store current timestamp in
> X
> > /
> > > PAGE_SIZE array position.
> > > When there's a need for eviction, we randomly choose K indices of array
> > > (if some of indices point to non-data pages, re-choose them) and evict
> > data
> > > page with minimum timestamp. In case K=5, we'll evict page from 17%
> least
> > > recently used set with 50% probability <http://math.stackexchange.com
> > > /questions/786392/expectation-of-minimum-of-n-i-i-d-uniform-
> > > random-variables>.
> > > Non-data pages can be simply recognized by having zero in corresponding
> > > position of timestamp array.
> > > If entry is too big and occupies more than one page, only first page
> will
> > > be tracked, and other will be considered as non-data pages.
> > > *K-random-LRU-2* is perspective variant of this approach. We'll evict
> > page
> > > with minimum time of penultimate access. Implementation is mostly the
> > same,
> > > the difference is that we should store two timestamps for each data
> page.
> > > LRU-2 outperforms <http://www.cs.cmu.edu/%7Echri
> > > stos/courses/721-resources/p297-o_neil.pdf> LRU by resolving "one-hit
> > > wonder" problem: if page is accessed very rarely, but accidentally
> > accessed
> > > once, it's "protected" from eviction for long time.
> > > Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap
> > > cache with standard 2KB pages requires 200MB of additional memory
> (400MB
> > in
> > > case of LRU-2 strategy).*
> > > *
> > >
> > > 2)*Adaptation of CLOCK-Pro
> > > *CLOCK-Pro (article <https://www.usenix.org/legacy
> > > /event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>,
> > > presentation <http://www.slideshare.net/huliang64/clockpro>) is modern
> > > page replacement algorithm, used in actual OS kernels. It is an
> > > approximation of LIRS <http://web.cse.ohio-state.edu
> > > /hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> with significant
> > > advantage for our case: instead of doubly linked list, which is hard to
> > > maintain in off-heap memory, CLOCK-Pro uses only circular list and
> three
> > > pointers. However, we can't use CLOCK-Pro as is: there's difference
> > between
> > > OS paged memory and Ignite off-heap storage. In OS, if memory page is
> > > swapped out, it still contains the same data, with same relevance and
> > > patterns of access. That's why keeping metadata for swapped-out page
> (see
> > > Non-resident Cold Pages set) works. In Ignite, in case we evict page,
> all
> > > data is discarded; entries with same keys may appear in the cache
> again,
> > > but they will be evenly distributed across the whole storage.
> > > I propose to use simplified version of CLOCK-Pro with only two pointers
> > > and two page sets: hot and cold. Non-resident pages won't be tracked.
> > > To implement it, we should allocate big off-heap array to track flags
> for
> > > each page. We should store (hot, cold) bit, (accessed, not accessed)
> bit
> > > and one bit indicating that the page is data page, totally 3 bits per
> > page.
> > > 100GB off-heap cache with standard 2KB pages requires ~20MB of
> additional
> > > memory, or 50MB in case we don't want to shrink metadata for multiple
> > pages
> > > in one byte.
> > > Memory overhead is really small, but we can't predict average and
> maximum
> > > time for one eviction, it has to be discovered during an experiment (in
> > > hypothetical worst case, clock hand can travel through the whole
> array).
> > > Also, hot/cold ratio can affect performance dramatically and should be
> > > tuned well.
> > >
> > > Note about classic "key" eviction: fair per-entry policies (FIFO, LRU,
> > > Sorted) may be useful for some users, and we may want to support them
> as
> > > well. It's no problem at all to implement them on-heap (more than,
> > current
> > > implementations of EvictionPolicy will work in if we make minor changes
> > in
> > > the code). In off-heap, FIFO is easy to implement (circular buffer),
> but
> > > LRU (and more effective LIRS, if someone wants it) require doubly
> linked
> > > lists. The only option I see is using our BPlusTree with some kind of
> > > implicit key, but this implies O(log n) overhead on each entry read
> > access.
> > >
> > > Another note: current GridCacheEvictionManager have eviction backup
> > > synchronization mechanism. Do we want to support it for off-heap
> storage
> > as
> > > well?
> > >
> > > Please let me know if my understanding of any algorithm or part of the
> > > system is wrong.
> > >
> > > --
> > > Best Regards,
> > > Ivan Rakov
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message