ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Gura (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-11687) Concurrent WAL replay & log may fail with CRC error on read
Date Tue, 16 Apr 2019 00:37:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818481#comment-16818481
] 

Andrey Gura commented on IGNITE-11687:
--------------------------------------

[~agoncharuk] I've investigated the problem deeper. While code snippet pointed by you is incorrect
and must be fixed it never executes by test because MMAP mode is switched on by default. I
think that {{FileWriteHandleImpl#addRecord()}} method is root of the problem. See the following
code snippet:

{code:java}
                    fillBuffer(buf, rec);

                    if (mmap) {
                        // written field must grow only, but segment with greater position
can be serialized
                        // earlier than segment with smaller position.
                        while (true) {
                            long written0 = written;

                            if (seg.position() > written0) {
                                if (WRITTEN_UPD.compareAndSet(this, written0, seg.position()))
                                    break;
                            }
                            else
                                break;
                        }
                    }

                    return ptr;
{code}

WAL iterator on {{wal.replay()}} call gets {{hnd.written}} field value while some previous
WAL record before this position is still not fully serialized. What do you think?

> Concurrent WAL replay & log may fail with CRC error on read
> -----------------------------------------------------------
>
>                 Key: IGNITE-11687
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11687
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Goncharuk
>            Assignee: Andrey Gura
>            Priority: Critical
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cause is the way {{end}} is calculated for WAL iterator:
> {code}
> if (hnd != null)
>     end = hnd.position();
> {code}
> {code}
>     @Override public FileWALPointer position() {
>         lock.lock();
>         try {
>             return new FileWALPointer(getSegmentId(), (int)written, 0);
>         }
>         finally {
>             lock.unlock();
>         }
>     }
> {code}
> Consider a partially written entry. In this case, {{written}} has been already updated,
concurrent WAL replay will attempt to read the incompletely written record and since {{end}}
is not null, iterator will fail with CRC error.
> The issue may be rarely reproduced by {{IgniteWalSerializerVersionTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message