ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexei Scherbakov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-9612) Improve checkpoint mark phase speed.
Date Thu, 27 Sep 2018 16:52:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630721#comment-16630721
] 

Alexei Scherbakov commented on IGNITE-9612:
-------------------------------------------

The ticket requires fix for concurrency issue with PartitionAllocationMap - it uses thread-unsafe
collections

Fixed with commit [1]

[1] 4527fe6de8bddea31a8142e78a84eacb534d0384

> Improve checkpoint mark phase speed.
> ------------------------------------
>
>                 Key: IGNITE-9612
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9612
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexei Scherbakov
>            Assignee: Alexei Scherbakov
>            Priority: Major
>             Fix For: 2.7
>
>
> I'm observing regular slow checkpoints due to long mark duration, which is not related
to dirty pages number:
> {noformat}
> 2018-09-01 14:55:20.408 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=01e0c7bf-842f-4ed6-8589-b4904063434f, startPtr=FileWALPointer
[idx=19814, fileOff=948996096, len=5233457],
> checkpointLockWait=0ms, checkpointLockHoldTime=951ms, walCpRecordFsyncDuration=39ms,
pages=78477, reason='timeout']
> 2018-09-01 14:55:21.307 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=01e0c7bf-842f-4ed6-8589-b4904063434f, pages=78477, markPos=FileWALPointer
[idx=19814, fileOff=948996096, len=5233457], walSegmentsCleared=0, walSegmentsCovered=[],
*markDuration=1002m*s, pagesWrite=478ms, fsync=421ms, total=1901ms] 
> {noformat}
> {noformat}
> 2018-09-01 14:58:20.355 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, startPtr=FileWALPointer
[idx=19814, fileOff=1000024208, len=5233457], checkpointLockWait=0ms, checkpointLockHoldTime=926ms,
walCpRecordFsyncDuration=14ms, pages=10837, reason='timeout']
> 2018-09-01 14:58:20.480 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, pages=10837, markPos=FileWALPointer
[idx=19814, fileOff=1000024208, len=5233457], walSegmentsCleared=0, walSegmentsCovered=[],
*markDuration=943ms*, pagesWrite=64ms, fsync=61ms, total=1068ms]
> {noformat}
> Debugging has revealed what this is due to large amount of work required to save metadata
for metapages and free/reuse lists. Because this is done under checkpoint write lock, all
other activities are blocked, resulting in increased tx and atomic ops latency.
> Simple solution: parallelize metadata processing during mark phase.
> Best way to solve the problem is described in IGNITE-9520.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message