db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5234) Unable to insert data into table. Failed due be "ERROR XSDG0: Page Page(51919,Container(0, 1104)) could not be read from disk."
Date Wed, 02 May 2012 15:54:52 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rick Hillegas updated DERBY-5234:
---------------------------------

    Attachment: derby-5234-01-aa-emptyAllocPage.diff

Attaching derby-5234-01-aa-emptyAllocPage.diff. These small changes make the repro run correctly.
Regression tests pass cleanly on this patch.

I have stumbled across at least 3 separate problems in the compression code. However, that
may simply mean that I don't understand the code. The 3 problems are:

1) A boundary checking error which causes an allocation extent to think that it still has
pages, even though those pages have been released to the operating system. This is what causes
the repro to fail.

2) A confusion about whether a variable represents a bit position or a page number. This causes
the code to not understand that all of the pages in an extent have been released. Fixing this
check does not change any user-visible behavior, but I think that fixing the check is a step
in the right direction.

3) The inability of the compression code to release pages held by the first allocation page.
I don't understand this problem yet. Before looking into this one, I need advice about whether
I am heading in the right direction.

More information about these 3 problems follows:

-------------------

Concerning (1), the boundary check which causes the repro to fail:

In AllocExtent.compressPages(), the new_highest_page argument can be -1. This happens if all
of the pages in the extent turn out to be empty. However, if new_highest_page is -1, then
the code does not fall into the block at line 577; that's the code which actually marks the
pages as released. The value of new_highest_page was calculated by AllocExtent.compress().
The variable name new_highest_page is a confusing name. This is a bit position and not a page
number, and in the case when it is -1, it is a flag that all pages are empty. AllocExtent.compress()
returns new_highest_page + 1, triggering its caller to fall into a block at line 1074 in AllocPage.compress();
that block releases pages to the operating system. That is  how we end up in the situation
that the pages are actually released but AllocExtent still thinks they are allocated. That,
in turn, is what tricks a later INSERT into trying to write onto a non-existent page.

The fix is to make the code fall into the block at 577 if new_highest_page is -1.


-------------------

Concerning (2), the confusion about whether AllocExtent.compress() returns a bit position
or a page number:

At line 1080 in AllocPage.compress(), the code compares a bit position to a page number. Bit
positions are small integers, e.g., in the range 0-200. Page numbers are potentially larger
integers in, say, the range 12000-12200. The weird comparison at line 1080 causes AllocPage.compress()
to not recognize that all of the pages in the extent have been released.

I have renamed last_valid_page to last_valid_page_bit to clarify that this is a bit position,
not a page number. And I have changed the check at 1080 to compare the bit position to another
bit position. This comparison deserves the attention of someone who knows this code better
than I do. Is this the right comparison?

In a follow-on cleanup issue, it might make sense to change variable names in the allocation
code to clarify what is a bit number and what is a page number. This may disclose other questionable
code in this area.


-------------------

Concerning (3), the inability of the compression code to release empty pages managed by the
first allocation page:

I had hoped that the change for (2) would cause the compress to release more space. But it
didn't. The compress only releases the pages managed by the second (last) allocation page.
All of the pages managed by the first allocation page are also empty, but they are not released.
This seems wrong to me. I would expect the file to shrink back to its initial size.

Before pursuing this follow-on issue, I would like advice about whether I am headed in the
right direction. Should the compress shrink the page back to its initial size? Or should SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE('APP',
'OPERATIONS', 0, 0, 1) just release empty pages managed by the second and higher allocation
pages?


-------------------

Touches the following files:

M       java/engine/org/apache/derby/impl/store/raw/data/AllocExtent.java

Fix for (1).

----------

M       java/engine/org/apache/derby/impl/store/raw/data/AllocPage.java

Clarification for (2).

                
> Unable to insert data into table. Failed due be "ERROR XSDG0: Page Page(51919,Container(0,
1104)) could not be read from disk."
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5234
>                 URL: https://issues.apache.org/jira/browse/DERBY-5234
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.5.3.0
>         Environment: HP-UX 11iv2 in production environment with JDK1.6; Solaris 5/10
in test environment with JDK 1.6
>            Reporter: Varma R
>            Priority: Critical
>              Labels: ERROR, XSDG0, apache, corruption, data, derby, derby_triage10_9
>         Attachments: 5234_alloc.out, 5234_page_10219.out, 5234_summary.out, DataFileReader_Output.zip,
DbCompressErrorTester.java, derby-5234-01-aa-emptyAllocPage.diff, log191.dat, log85.dat
>
>
> One of the derby database table "gets corrupted"/"indicates connection not available"
during processing inserts from java client application as shown in the trace and the only
way to recover from this error is to rebuild the DB - by deleting the data and creating the
tables again. This happens once in a while (thrice in a span of two months) and the java application
(run in multiple servers), which updates the database, processes around 100 million transactions
per hour (in total and each transation results in 4-5 updates to the DB) 
> There are eight tables in the derby database.
>    TABLE NAME                           ROWS COUNT (at time of corruption)
> ---------------------------------------------------------------------------------
>    KPI.KPI_MERGEIN;                     362917
>    KPI.KPI_IN;                                 422508
>    KPI.KPI_DROPPED;                    53667
>    KPI.KPI_ERROR1;                       0
>    KPI.KPI_ERROR2;                       2686
>    KPI.KPI_ERRORMERGE;            0
>    KPI.KPI_MERGEOUT;                 362669
>    KPI.KPI_OUT;                             125873
> The derby database has been started with the following parameters 
> CMD="java -Dderby.system.home=$DERBY_OPTS -Dderby.locks.monitor=true -Dderby.locks.deadlockTrace=true
-Dderby.locks.escalationThreshold=50000 -Dderby.locks.waitTimeout=
> -1 -Dderby.storage.pageCacheSize=100000 -Xms512M -Xmx3072M -XX:NewSize=256M -classpath
$DERBY_CLASSPATH org.apache.derby.drda.NetworkServerControl start -h $KPIDERBYHOST -p $DERBY_KPI_PORT"
> The corrupted database tar (filesystem) in live environment was moved to a test system
(Solaris system) and few checks were run on the corrupted DB as part of analysis (DB does
start fine)
> While trying to insert a row in any table expect KPI.KPI_MERGEIN, it is successful. But
when a new row is inserted into KPI.KPI_MERGEIN table using command line tool it's throwing
below error message (the same message that appeared in live 
> ij> INSERT INTO KPI.KPI_MERGEIN (A0_TXN_ID, A1_NE_ID, A2_CHU_IP_ADDR, A3_BATCH_DATE,A5_CODE)
VALUES (-1, 'BMTDE', '192.2.1.3', 231456879, 'KSD');
> ERROR 08006: A network protocol error was encountered and the connection has been terminated:
the requested command encountered an unarchitected and implementation-specific condition for
which there was no architected message
> and in derby.log file it shows below error stacktrace.
> ERROR XSDG0: Page Page(51919,Container(0, 1104)) could not be read from disk.
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.initPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.newPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.addPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.addPage(Unknown Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.doInsert(Unknown Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.insertAndFetchLocation(Unknown
Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(Unknown Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLIMM(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> Caused by: java.io.EOFException: Reached end of file while attempting to read a whole
page.
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         ... 20 more
> ============= begin nested exception, level (1) ===========
> java.io.EOFException: Reached end of file while attempting to read a whole page.
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.initPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.newPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.addPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.addPage(Unknown Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.doInsert(Unknown Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.insertAndFetchLocation(Unknown
Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(Unknown Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLIMM(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> ============= end nested exception, level (1) ===========
> 2011-05-16 10:37:21.392 GMT:
> Shutting down instance a816c00e-012f-f85f-7892-ffff874c3ff6
> ----------------------------------------------------------------
> Cleanup action completed
> The problem is only with INSERT statement. When i try SELECT statement on KPI.KPI_MERGEIN
table it is working well.The database file system size (in seg0) is 1.3 GB
> Can anyone help me out in identifying the problem that why for one table alone its throwing
the above error message ? Would upgrade to a new version help ? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message