Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 42897 invoked from network); 19 Nov 2008 18:07:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Nov 2008 18:07:07 -0000 Received: (qmail 50958 invoked by uid 500); 19 Nov 2008 18:07:15 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 50931 invoked by uid 500); 19 Nov 2008 18:07:15 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 50912 invoked by uid 99); 19 Nov 2008 18:07:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2008 10:07:15 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2008 18:05:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4BC67234C298 for ; Wed, 19 Nov 2008 10:06:44 -0800 (PST) Message-ID: <71448832.1227118004308.JavaMail.jira@brutus> Date: Wed, 19 Nov 2008 10:06:44 -0800 (PST) From: "Kathey Marsden (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Updated: (DERBY-637) Conglomerate does not exist after inserting large data volume In-Reply-To: <5379765.1129886875518.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-637?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Kathey Marsden updated DERBY-637: --------------------------------- Attachment: noContainerBug.java Here is the old repro. It will need some work to run with Derby as compres= s table has changed, imports have changed etc. 3653 has some interesting co= mments regarding the "fix" which seemed to just reduce the window of opport= unity for this bug to occur. I don't know if things changed after 3653 or = not. Below is the description and comments from the issue: Description An application forks 20 threads to update a table (insert or=20 deletes depending on number of rows in the table). When the number of rows falls to a low water mark, one thread=20 will do=20 lock table x in exclusive mode retryin until it succeeds, then=20 alter table x compress The other threads are blocked trying to get read locks, part=20 way through executing their plan. Compress table near the end of its work invalidates plans on=20 this table since the conglomerateId has changed for the=20 underlying store. However the blocks threads are already using their=20 invalid plans and when they get the lock get the error "Container {N} not found" Notes: =09 I am not sure if this problem is already documented. I submitted a "fix" which reduces the problem but does not=20 solve the known race problem with data dictionaries. Instead of 14 errors we get 1 error now. Person A wrote: The test first does "lock table datatypes exclusive mode"=20 before starting the compress. some of us thought if the compress had an=20 excl lock it would maybe solve things. here is the problem. 20 threads are running either inserting or deleting=20 depending on how many rows there currently are in the table. if we go too high=20 we start deleting . when we drop below a low water mark one thread does the=20 "lock table excl" th en alter table compress. The other threads (19) are part=20 way into executing their delete and block getting a write lock. they are part=20 way into their query plan, right? bytecode or before that? compress eventually finishes, the Container and=20 conglomerate id change, the plans were invalidated, the test commits, i assume the=20 is lock released at commit. now some of the updater threads get the lock in turn and=20 get the "Container {N} not found" error. 14 errors, not 19. why=20 not all 19, don't k now . then everyone must recompile because there are no more=20 errors and we continu e on. The question is, is there a way=20 to recompile once you get your lock but notice your plan is=20 invalidated? is wait()/notify() used for the locks? could we wake=20 them telling them to check their plan validation? Person B replied: I think you have what is going on nailed, but I have no=20 ideas how to fix it. I think this is a known language issue, but still=20 waiting on comment. I think it is too late to stop and retry, if I am not=20 mistaken an arbitrary query could have already begun returning rows to=20 the user when it encounters this error (maybe not this case - but a=20 query with a complicated join may). It seems the "right" thing to do is to get locks on all=20 tables in a plan up front before execution, and then check if the plan is=20 valid. I think this has been considered too major to do. No other ideas at this point other than getting a test=20 case, logging a bug, and moving on. Person C replied: This is a classic race condition. The problem is that ALTER=20 TABLE COMPRESS gets its exclusive lock near the beginning of its=20 execution, but invalidates dependent plans near the end of its=20 execution. We could either eliminate or narrow the window that allows=20 the race condition by moving plan invalidation to the beginning of=20 the execution of ALTER TABLE COMPRESS. We want it to be=20 impossible or unlikely that an inserter or deleter can start executing=20 with a conglomerate that's about to go away. Another possibility would be for the store to provide a way=20 for the new conglomerate to have the same conglomerate id as=20 the old conglomerate. The store would also have to take care of=20 any open conglomerate controllers and scans that used the old conglomerate. I don't know the store well enough to say how=20 hard this would be, but I'm guessing it would be very hard. Person B then replies: This would be very hard for store. In all these cases of=20 swapping out the container and conglomerate the id is the unit of recovery=20 and using the "same" id for something that may have to be recovered is=20 hard. Also the same type of problem can come about if an index=20 exists on a table, and then is dropped. If the plan tries to use the=20 index after it has been dropped there is nothing the store can do in that=20 case. moving the invalidation up seems like a good idea, but as=20 Person C points out it doesn't solve it if there is any time when another=20 thread can validate it's plan and then start executing, block on a=20 lock and when it wakes up find the plan is invalid. And i made the change to move the invalidate before we start=20 moving rows from old to the new table. This helps the test, but does not solve the real problem. Hope this helps. > Conglomerate does not exist after inserting large data volume > -------------------------------------------------------------- > > Key: DERBY-637 > URL: https://issues.apache.org/jira/browse/DERBY-637 > Project: Derby > Issue Type: Bug > Components: Store > Affects Versions: 10.2.1.6 > Environment: Solaris 10 Sparc > Sun 1.5 VM > Client/server DB > 1 GB page cache > JVM heap on server: min 1 GB, max 3 GB=20 > Reporter: =C3=98ystein Gr=C3=B8vlen > Attachments: noContainerBug.java > > > In a client/server environment I did as follows: > 1. Started server > 2. Dropped existing TPC-B tables and created new ones > 3. Inserted data for 200 million accounts (30 GB account table) > 4. When insertion was finished, tried to run a TPC-B transaction on same = connection and was informed that conglomerate does not exist. (See stack t= race below). > 5. Stopped client, started a new client to run a TPC-B transaction, got s= ame error > 6. Restarted server > 7. Ran client again, and everything worked fine. > Stack trace from derby.log: > 2005-10-19 18:47:41.838 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550= 4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),= (DRDAID =3D NF000001.OB77-578992897558106193{1}), Cleanup action starting > 2005-10-19 18:47:41.839 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550= 4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),= (DRDAID =3D NF000001.OB77-578992897558106193{1}), Failed Statement is: UPD= ATE accounts SET abal =3D abal + ? WHERE aid =3D ? AND bid =3D ? > ERROR XSAI2: The conglomerate (8,048) requested does not exist. > =09at org.apache.derby.iapi.error.StandardException.newException(Standard= Exception.java:311) > =09at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.rea= dConglomerate(HeapConglomerateFactory.java:224) > =09at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFin= d(RAMAccessManager.java:486) > =09at org.apache.derby.impl.store.access.RAMTransaction.findExistingCongl= omerate(RAMTransaction.java:389) > =09at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(= RAMTransaction.java:1315) > =09at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(= B2IForwardScan.java:237) > =09at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.jav= a:750) > =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans= action.java:530) > =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans= action.java:1582) > =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorV= iaIndex(DataDictionaryImpl.java:7218) > =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescri= ptor(DataDictionaryImpl.java:5697) > =09at org.apache.derby.impl.sql.compile.QueryTreeNode.resolveTableToSynon= ym(QueryTreeNode.java:1510) > =09at org.apache.derby.impl.sql.compile.UpdateNode.bind(UpdateNode.java:2= 07) > =09at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatem= ent.java:333) > =09at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement= .java:107) > =09at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.pre= pareInternalStatement(GenericLanguageConnectionContext.java:704) > =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement.(EmbedPrepa= redStatement.java:118) > =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.(EmbedPre= paredStatement20.java:82) > =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.(EmbedPre= paredStatement30.java:62) > =09at org.apache.derby.jdbc.Driver30.newEmbedPreparedStatement(Driver30.j= ava:92) > =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedCo= nnection.java:678) > =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedCo= nnection.java:575) > =09at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > =09at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp= l.java:39) > =09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc= essorImpl.java:25) > =09at java.lang.reflect.Method.invoke(Method.java:585) > =09at org.apache.derby.impl.drda.DRDAStatement.prepareStatementJDBC3(DRDA= Statement.java:1497) > =09at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java= :486) > =09at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatem= ent.java:444) > =09at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnTh= read.java:3132) > =09at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnT= hread.java:673) > =09at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:2= 14) > Cleanup action completed > 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550= 4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),= (DRDAID =3D NF000001.OB77-578992897558106193{1}), Cleanup action starting > 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550= 4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),= (DRDAID =3D NF000001.OB77-578992897558106193{1}), Failed Statement is: cal= l SYSIBM.SQLCAMESSAGE(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) > ERROR XSAI2: The conglomerate (8,048) requested does not exist. > =09at org.apache.derby.iapi.error.StandardException.newException(Standard= Exception.java:311) > =09at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.rea= dConglomerate(HeapConglomerateFactory.java:224) > =09at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFin= d(RAMAccessManager.java:486) > =09at org.apache.derby.impl.store.access.RAMTransaction.findExistingCongl= omerate(RAMTransaction.java:389) > =09at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(= RAMTransaction.java:1315) > =09at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(= B2IForwardScan.java:237) > =09at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.jav= a:750) > =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans= action.java:530) > =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans= action.java:1582) > =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorV= iaIndex(DataDictionaryImpl.java:7218) > =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescri= ptor(DataDictionaryImpl.java:5697) > =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getRoutineList= (DataDictionaryImpl.java:5766) > =09at org.apache.derby.impl.sql.compile.StaticMethodCallNode.resolveRouti= ne(StaticMethodCallNode.java:303) > =09at org.apache.derby.impl.sql.compile.StaticMethodCallNode.bindExpressi= on(StaticMethodCallNode.java:192) > =09at org.apache.derby.impl.sql.compile.JavaToSQLValueNode.bindExpression= (JavaToSQLValueNode.java:250) > =09at org.apache.derby.impl.sql.compile.CallStatementNode.bind(CallStatem= entNode.java:177) > =09at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatem= ent.java:333) > =09at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement= .java:107) > =09at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.pre= pareInternalStatement(GenericLanguageConnectionContext.java:704) > =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement.(EmbedPrepa= redStatement.java:118) > =09at org.apache.derby.impl.jdbc.EmbedCallableStatement.(EmbedCalla= bleStatement.java:68) > =09at org.apache.derby.impl.jdbc.EmbedCallableStatement20.(EmbedCal= lableStatement20.java:78) > =09at org.apache.derby.impl.jdbc.EmbedCallableStatement30.(EmbedCal= lableStatement30.java:60) > =09at org.apache.derby.jdbc.Driver30.newEmbedCallableStatement(Driver30.j= ava:115) > =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnect= ion.java:771) > =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnect= ion.java:719) > =09at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java= :475) > =09at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatem= ent.java:444) > =09at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnTh= read.java:3132) > =09at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnT= hread.java:673) > =09at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:2= 14) > Cleanup action completed --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.