ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Plekhanov (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL
Date Thu, 03 Oct 2019 13:29:01 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943607#comment-16943607
] 

Aleksey Plekhanov commented on IGNITE-6930:
-------------------------------------------

The patch is ready. To minimize WAL record I've used next approach:

There is a small on-heap pages list cache allocated for each bucket. There are three types
of operations with free-lists: put the page to the tail of the bucket (after insert and remove
row), take a page from the tail of the bucket (before insert row), remove the page from the
bucket (before remove row), each of these operations first look into the pages cache, then
work with page memory.

There is no WAL record needed if the page uses only buckets pages cache. So, it's possible
then the page was put into free-list, moved through the bucket, leave the free list and hasn't
produced any free-list WAL record at all.

On-heap pages cache is flushed to page memory before each checkpoint to ensure the same recovery
guarantees as now (physical WAL records are restored from WAL only to the moment of the last
unsuccessful checkpoint if it was started, so we need only final buckets state at the moment
of checkpoint). 

[~ivan.glukos], could you please have a look?

 

> Optionally to do not write free list updates to WAL
> ---------------------------------------------------
>
>                 Key: IGNITE-6930
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6930
>             Project: Ignite
>          Issue Type: Task
>          Components: cache
>            Reporter: Vladimir Ozerov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: IEP-8, performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When entry is updated,
we need to update free list(s) several times. Currently free list is persistent structure,
so every update to it must be logged to be able to recover after crash. This may incur significant
overhead, especially for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real data, "F"
- free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject [idHash=2053299190,
hash=1986931360, typeId=-1580729813]], super = [DataEntry [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion
[topVer=122147562, order=1510667560607, nodeOrder=1], partId=0, partCnt=4]]]], super=WALRecord
[size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, pageId=0001000000000006,
grpId=94416770, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000006, super=WALRecord
[size=37, chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005,
super=WALRecord [size=129, chainSize=0, pos=null, type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=0001000000000005, super=PageDeltaRecord [grpId=94416770,
pageId=0001000000000008, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, super=PageDeltaRecord
[grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null,
type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord [grpId=94416770,
pageId=0001000000000004, super=WALRecord [size=47, chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005,
super=WALRecord [size=30, chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, pageId=0001000000000008,
grpId=94416770, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000008, super=WALRecord
[size=37, chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord [grpId=94416770,
pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=0001000000000005, super=PageDeltaRecord [grpId=94416770,
pageId=0001000000000006, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, super=PageDeltaRecord
[grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null,
type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly here),
you will see that data update required ~300 bytes, so do free list update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts will have
to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and re-create
the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message