db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Sitsky (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DERBY-7034) Derby's sync() handling can lead to database corruption (at least on Linux)
Date Mon, 11 Feb 2019 08:20:00 GMT
David Sitsky created DERBY-7034:
-----------------------------------

             Summary: Derby's sync() handling can lead to database corruption (at least on
Linux)
                 Key: DERBY-7034
                 URL: https://issues.apache.org/jira/browse/DERBY-7034
             Project: Derby
          Issue Type: Bug
          Components: Store
            Reporter: David Sitsky


I recently read about "fsyncgate 2018" that the Postgres team raised: https://wiki.postgresql.org/wiki/Fsync_Errors.
 https://lwn.net/Articles/752063/ has a good overview of the issue relating to fsync() behaviour
on Linux.  The short summary is on some versions of Linux if you retry fsync() after it failed,
it will succeed and you will end up with corrupted data on disk.

At a quick glance at the Derby code, I have already seen two places where sync() is retried
in a loop which is clearly dangerous.  There could be other areas too.

In LogAccessFile:
{code}
    /**
     * Guarantee all writes up to the last call to flushLogAccessFile on disk.
     * <p>
     * A call for clients of LogAccessFile to insure that all data written
     * up to the last call to flushLogAccessFile() are written to disk.
     * This call will not return until those writes have hit disk.
     * <p>
     * Note that this routine may block waiting for I/O to complete so 
     * callers should limit the number of resource held locked while this
     * operation is called.  It is expected that the caller
     * Note that this routine only "writes" the data to the file, this does not
     * mean that the data has been synced to disk.  The only way to insure that
     * is to first call switchLogBuffer() and then follow by a call of sync().
     *
     **/
    public void syncLogAccessFile() 
        throws IOException, StandardException
    {
        for( int i=0; ; )
        {
            // 3311: JVM sync call sometimes fails under high load against NFS 
            // mounted disk.  We re-try to do this 20 times.
            try
            {
                synchronized( this)
                {
                    log.sync();
                }

                // the sync succeed, so return
                break;
            }
            catch( SyncFailedException sfe )
            {
                i++;
                try
                {
                    // wait for .2 of a second, hopefully I/O is done by now
                    // we wait a max of 4 seconds before we give up
                    Thread.sleep( 200 ); 
                }
                catch( InterruptedException ie )
                {
                    InterruptStatus.setInterrupted();
                }

                if( i > 20 )
                    throw StandardException.newException(
                        SQLState.LOG_FULL, sfe);
            }
        }
    }
{code}
And LogToFile has similar retry code.. but without handling for SyncFailedException:
{code}
    /**
     * Utility routine to call sync() on the input file descriptor.
     * <p> 
    */
    private void syncFile( StorageRandomAccessFile raf) 
        throws StandardException
    {
        for( int i=0; ; )
        {
            // 3311: JVM sync call sometimes fails under high load against NFS 
            // mounted disk.  We re-try to do this 20 times.
            try
            {
                raf.sync();

                // the sync succeed, so return
                break;
            }
            catch (IOException ioe)
            {
                i++;
                try
                {
                    // wait for .2 of a second, hopefully I/O is done by now
                    // we wait a max of 4 seconds before we give up
                    Thread.sleep(200);
                }
                catch( InterruptedException ie )
                {   
                    InterruptStatus.setInterrupted();
                }

                if( i > 20 )
                {
                    throw StandardException.newException(
                                SQLState.LOG_FULL, ioe);
                }
            }
        }
    }
{code}

It seems Postgres, MySQL and MongoDB have already changed their code to "panic" if an error
comes from an fsync() call.

There is a lot more complexities with how fsync() reports errors (if at all).  It is worth
getting into it further as I am not familiar with Derby's internals and how affected it could
be by this.

Interestingly people have indicated this issue is more likely to happen for network filesystems
(since write failures are more common due to the network going down) and in the past it was
easy just to say "NFS is broken".. but in actual fact the problem was in some cases with fsync()
and how it was called in a loop.

I've been trying to find out if Windows has similar issues without much luck.  But given the
mysterious corruption issues I have seen on the past with Windows/CIFS.. I do wonder if this
is related somehow.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message