db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5358) SYSCS_COMPRESS_TABLE failed with conglomerate not found exception
Date Thu, 31 May 2012 11:20:23 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286484#comment-13286484
] 

Knut Anders Hatlen commented on DERBY-5358:
-------------------------------------------

I think I found the problem that's causing this. There's a race condition in TableDescriptor.getHeapConglomerateId():

		/* If we've already cached the heap conglomerate number, then
		 * simply return it.
		 */
		if (heapConglomNumber != -1)
		{
			return heapConglomNumber;
		}

... (find the heap conglomerate in the list of conglomerates) ...

		heapConglomNumber = cd.getConglomerateNumber();

		return heapConglomNumber;

I instrumented this class and found that it never set heapConglomNumber to 4,294,967,295,
but the method still returned that value some times.

The problem is that heapConglomNumber is a long, and the Java spec doesn't guarantee that
reads/writes of long values are atomic.

So what seems to happen, is:

- Two threads (T1 and T2) call getHeapConglomerateId() on the same TableDescriptor at about
the same time, and no other calls to getHeapConglomerateId() have been made on that object
before, so heapConglomNumber is initially -1.

- T1 goes ahead finding the real conglomerate number and writing it to heapConglomNumber.

- At the same time, T2 reads heapConglomNumber in order to check if it's already cached. However,
since T1's write was not atomic, it only sees half of it. That's enough to make it see that
the cached conglomerate number is -1, so that it concludes that it can use it, but the number
it sees is not the right one.

If T2 happens to see only the most significant half of the conglomerate number written by
T1, that half will probably be all zeros (because it's not very likely that more than 4 billion
conglomerates have been created). The bits in the least significant half will in that case
be all ones (because the initial value is -1, which is all ones in two's complement). The
returned value will therefore be 0x00000000ffffffff == 4,294,967,295, as seen in the error
in the bug description.

I've also seen variants where the returned number is a negative one. That happens if T2 instead
sees the least significant half of the correct column number, and the most significant half
of the initial value -1. For example, if the conglomerate number is 344624, the error message
will say: The conglomerate (-4 294 622 672) requested does not exist.
                
> SYSCS_COMPRESS_TABLE failed with conglomerate not found exception
> -----------------------------------------------------------------
>
>                 Key: DERBY-5358
>                 URL: https://issues.apache.org/jira/browse/DERBY-5358
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.1.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_9
>
> When running the D4275.java repro attached to DERBY-4275 (with the patch invalidate-during-invalidation.diff
as well as the fix for DERBY-5161 to prevent the select thread from failing) in four parallel
processes on the same machine, one of the processes failed with the following stack trace:
> java.sql.SQLException: The exception 'java.sql.SQLException: The conglomerate (4,294,967,295)
requested does not exist.' was thrown while evaluating an expression.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Util.java:278)
>         at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:407)
>         at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
>         at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
>         at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1341)
>         at D4275.main(D4275.java:52)
> Caused by: java.sql.SQLException: The exception 'java.sql.SQLException: The conglomerate
(4,294,967,295) requested does not exist.' was thrown while evaluating an expression.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
>         ... 10 more
> Caused by: java.sql.SQLException: The conglomerate (4,294,967,295) requested does not
exist.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
>         at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256)
>         at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400)
>         at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
>         at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
>         at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308)
>         at org.apache.derby.catalog.SystemProcedures.SYSCS_COMPRESS_TABLE(SystemProcedures.java:792)
>         at org.apache.derby.exe.acd381409ax0131x72b6x8e11x0000037164a81.g0(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.derby.impl.services.reflect.ReflectMethod.invoke(ReflectMethod.java:46)
>         at org.apache.derby.impl.sql.execute.CallStatementResultSet.open(CallStatementResultSet.java:75)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:448)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
>         ... 3 more
> Caused by: ERROR XSAI2: The conglomerate (4,294,967,295) requested does not exist.
>         at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
>         at org.apache.derby.impl.store.access.RAMAccessManager.getFactoryFromConglomId(RAMAccessManager.java:382)
>         at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
>         at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
>         at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1308)
>         at org.apache.derby.impl.sql.execute.DDLConstantAction.lockTableForDDL(DDLConstantAction.java:252)
>         at org.apache.derby.impl.sql.execute.AlterTableConstantAction.executeConstantActionBody(AlterTableConstantAction.java:364)
>         at org.apache.derby.impl.sql.execute.AlterTableConstantAction.executeConstantAction(AlterTableConstantAction.java:275)
>         at org.apache.derby.impl.sql.execute.MiscResultSet.open(MiscResultSet.java:61)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:448)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
>         ... 15 more
> Test stopped after 9342310 ms
> The conglomerate number 4,294,967,295 looks suspicious, as it's equal to 2^32-1. Perhaps
it's hitting some internal limit on the number of conglomerates? The test case used the in-memory
back-end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message