db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3347) ERROR XSDB3: Container information cannot change once written
Date Tue, 08 Apr 2008 11:55:24 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586767#action_12586767

Knut Anders Hatlen commented on DERBY-3347:

I think I see how this can happen.

When we read or write through the page cache, we use FileChannels to
allow multiple threads to access the same data file in parallel (after
DERBY-801). The first alloc page in a file can also be accessed
directly by the container object (via the container cache), in which
case RandomAccessFile.seek() + read/write is used instead of

Theoretically, this should work because the FileChannel operations use
absolute positions and should therefore not influence the positioning
of the read/write performed by the container object, and
synchronization ensures that RAF.seek+read/write only happens in one
thread at a time. However, it seems like on some platforms[1] the
position of the RandomAccessFile object can be changed when operations
using absolute positioning are performed on the FileChannel object
returned by getChannel() on the RAF. I guess this is to be expected,
since FileChannel's javadoc only guarantees that the operations on a
FileChannel object are thread safe, not that a mix of operations on
FileChannel objects and RandomAccessFile objects is thread safe.

So what I think is happening, is that the check point code is cleaning
the container cache. At the same time, the page cache initiates the
cleaning of a page in the same container as the container cache is
cleaning, resulting in a FileChannel operation which changes the
position of the RandomAccessFile used by the container cache. The
subsequent call to RandomAccessFile.readFully() therefore reads data
from the wrong position in the file. When the container cache later
attempts to write the data back to the file, the inconsistency is
detected and the XSDB3 error is raised.

I believe that we can fix this issue by rewriting the parts that don't
yet use FileChannel, so that FileChannel is used consistently and the
thread-safety guarantees in FileChannel's javadoc apply.

[1] Some platforms == Windows. I've also tested it on Linux and
Solaris, and there the position on the RandomAccessFile doesn't seem
to be affected by operations using absolute positions.

> ERROR XSDB3: Container information cannot change once written
> -------------------------------------------------------------
>                 Key: DERBY-3347
>                 URL: https://issues.apache.org/jira/browse/DERBY-3347
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:,
>         Environment: Windows 2003 Server
> Sun Java 1.6.0_03
>            Reporter: Bogdan Calmac
>            Priority: Critical
> We are using derby as an embedded DB for our data collection server. During an endurance
test when we do around 270 inserts and 9 updates per second, for about a week, I ocasionally
see the error below in the deby log (and nothing else beside this).
> This is a vanilla installation, we run derby embedded with no extra configuration.  I
can confirm that there is no memory problem, the heap usage seems constant over time.
> Can somebody provide some more information regarding the effects of this error? By looking
at the stacktrace, it looks like a checkpoint operation is aborted due to some inconsistency
in the internal data structure. If the error does not repeat immediately, does it mean that
the next checkpoint is successful and there is no data loss? 
> I can't provide a test case for that, the error happens after about 1-2 day of running
our software. I will rerun the test with the debug jars to capture the line numbers in the
stacktrace.  Also, I'm starting another test with, to see if this problem was introduced
in the latest version.
> There are another two bugs referring to this error, (https://issues.apache.org/jira/browse/DERBY-2284
and https://issues.apache.org/jira/browse/DERBY-3087) but they seem to happen in response
to some client action. This use case is a bit different, the client keeps inserting and updating
records for several days in a steady manner and at some point the error pops up.
> And lastly, here is the exception:
> Checkpoint Daemon caught standard exception
> ------------  BEGIN ERROR STACK -------------
> ERROR XSDB3: Container information cannot change once written: was 0, now 80
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.AllocPage.WriteContainerInfo(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown Source)
> 	at org.apache.derby.impl.services.cache.CachedItem.clean(Unknown Source)
> 	at org.apache.derby.impl.services.cache.Clock.cleanCache(Unknown Source)
> 	at org.apache.derby.impl.services.cache.Clock.cleanAll(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.RawStore.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.performWork(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:619)
> ------------  END ERROR STACK -------------

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message