ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Govorukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-11749) Implement automatic pages history dump on CorruptedTreeException
Date Thu, 30 May 2019 20:48:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852341#comment-16852341
] 

Dmitriy Govorukhin commented on IGNITE-11749:
---------------------------------------------

[~akalashnikov], [~ibessonov] Thanks for the contribution. Merged to master.

> Implement automatic pages history dump on CorruptedTreeException
> ----------------------------------------------------------------
>
>                 Key: IGNITE-11749
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11749
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Assignee: Anton Kalashnikov
>            Priority: Major
>             Fix For: 2.8
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, the only way to debug possible bugs in checkpointer/recovery mechanics is
to manually parse WAL files after the corruption happened. This is not practical for several
reasons. First, it requires manual actions which depend on the content of the exception. Second,
it is not always possible to obtain WAL files (it may contain sensitive data).
> We need to add a mechanics which will dump all information required for primary analysis
of the corruption to the exception handler. For example, if an exception happened when materializing
a link {{0xabcd}} written on an index page {{0xdcba}}, we need to dump history of both pages
changes, checkpoint records on the analysis interval. Possibly, we should include FreeList
pages to which the aforementioned pages were included to.
> Example of output:
> {noformat}
> [2019-05-07 11:57:57,350][INFO ][test-runner-#58%diagnostic.DiagnosticProcessorTest%][PageHistoryDiagnoster]
Next WAL record :: PageSnapshot [fullPageId = FullPageId [pageId=0002ffff00000000, effectivePageId=0000ffff00000000,
grpId=-2100569601], page = [
> Header [
> 	type=11 (PageMetaIO),
> 	ver=1,
> 	crc=0,
> 	pageId=844420635164672(offset=0, flags=10, partId=65535, index=0)
> ],
> PageMeta[
> 	treeRoot=844420635164675,
> 	lastSuccessfulFullSnapshotId=0,
> 	lastSuccessfulSnapshotId=0,
> 	nextSnapshotTag=1,
> 	lastSuccessfulSnapshotTag=0,
> 	lastAllocatedPageCount=0,
> 	candidatePageCount=0
> ]],
> super = [WALRecord [size=4129, chainSize=0, pos=FileWALPointer [idx=0, fileOff=103, len=4129],
type=PAGE_RECORD]]]
> Next WAL record :: CheckpointRecord [cpId=c6ba7793-113b-4b54-8530-45e1708ca44c, end=false,
cpMark=FileWALPointer [idx=0, fileOff=29, len=29], super=WALRecord [size=1963, chainSize=0,
pos=FileWALPointer [idx=0, fileOff=39686, len=1963], type=CHECKPOINT_RECORD]]
> Next WAL record :: PageSnapshot [fullPageId = FullPageId [pageId=0002ffff00000000, effectivePageId=0000ffff00000000,
grpId=-1368047378], page = [
> Header [
> 	type=11 (PageMetaIO),
> 	ver=1,
> 	crc=0,
> 	pageId=844420635164672(offset=0, flags=10, partId=65535, index=0)
> ],
> PageMeta[
> 	treeRoot=844420635164675,
> 	lastSuccessfulFullSnapshotId=0,
> 	lastSuccessfulSnapshotId=0,
> 	nextSnapshotTag=1,
> 	lastSuccessfulSnapshotTag=0,
> 	lastAllocatedPageCount=0,
> 	candidatePageCount=0
> ]],
> super = [WALRecord [size=4129, chainSize=0, pos=FileWALPointer [idx=0, fileOff=55961,
len=4129], type=PAGE_RECORD]]]
> Next WAL record :: CheckpointRecord [cpId=145e599e-66fc-45f5-bde4-b0c392125968, end=false,
cpMark=null, super=WALRecord [size=21409, chainSize=0, pos=FileWALPointer [idx=0, fileOff=13101788,
len=21409], type=CHECKPOINT_RECORD]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message