Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Message-ID: <71448832.1227118004308.JavaMail.jira@brutus>
Date: Wed, 19 Nov 2008 10:06:44 -0800 (PST)
From: "Kathey Marsden (JIRA)" <jira@apache.org>
To: derby-dev@db.apache.org
Subject: [jira] Updated: (DERBY-637) Conglomerate does not exist after
 inserting large data  volume
In-Reply-To: <5379765.1129886875518.JavaMail.jira@ajax.apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/DERBY-637?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-637:
---------------------------------

    Attachment: noContainerBug.java

Here is the old repro.  It will need some work to run with Derby as compres=
s table has changed, imports have changed etc. 3653 has some interesting co=
mments regarding the "fix" which seemed to just reduce the window of opport=
unity for this bug to occur.  I don't know if things changed after 3653 or =
not. Below is the description and comments from the issue:

Description
An application forks 20 threads to update a table (insert or=20
deletes depending on number of rows in the table).
When the number of rows falls to a low water mark, one thread=20
will do=20
    lock table x in exclusive mode
retryin until it succeeds, then=20
    alter table x compress

The other threads are blocked trying to get read locks, part=20
way through executing their plan.
Compress table near the end of its work invalidates plans on=20
this table since the conglomerateId has changed for the=20
underlying
store.    However the blocks threads are already using their=20
invalid plans and when they get the lock get the error
    "Container {N} not found"

Notes:
 =09

I am not sure if this problem is already documented.

I submitted a "fix" which reduces the problem but does not=20
solve the known race problem with data dictionaries.
Instead of 14 errors we get 1 error now.

Person A wrote:

    The test first does "lock table datatypes exclusive mode"=20
before starting
    the compress.   some of us thought if the compress had an=20
excl lock it would

    maybe solve things.

    here is the problem.

    20 threads are running either inserting or deleting=20
depending on how many
    rows there currently are in the table.   if we go too high=20
we start deleting
.

    when we drop below a low water mark one thread does the=20
"lock table excl" th
en
    alter table compress.    The other threads (19) are part=20
way into executing
    their delete and block getting a write lock.  they are part=20
way into their
    query plan, right?  bytecode or before that?

    compress eventually finishes, the Container and=20
conglomerate id change,
    the plans were invalidated, the test commits, i assume the=20
is lock released
    at commit.

    now some of the updater threads get the lock in turn and=20
get the
    "Container {N} not found" error.  14 errors, not 19. why=20
not all 19, don't k
now
    .
    then everyone must recompile because there are no more=20
errors and we continu
e
    on.

    The question is, is there a way=20
    to recompile once you get your lock but notice your plan is=20
invalidated?
    is wait()/notify() used for the locks?   could  we wake=20
them telling them
    to check their plan validation?

Person B replied:

    I think you have what is going on nailed, but I have no=20
ideas how to
    fix it.  I think this is a known language issue, but still=20
waiting on comment.

    I think it is too late to stop and retry, if I am not=20
mistaken an
    arbitrary query could have already begun returning rows to=20
the user
    when it encounters this error (maybe not this case - but a=20
query with
    a complicated join may).

    It seems the "right" thing to do is to get locks on all=20
tables in a plan
    up front before execution, and then check if the plan is=20
valid.  I think
    this has been considered too major to do.

    No other ideas at this point other than getting a test=20
case, logging a
    bug, and moving on.

Person C  replied:

    This is a classic race condition. The problem is that ALTER=20
TABLE
    COMPRESS gets its exclusive lock near the beginning of its=20
execution,
    but invalidates dependent plans near the end of its=20
execution.
    We could either eliminate or narrow the window that allows=20
the race
    condition by moving plan invalidation to the beginning of=20
the
    execution of ALTER TABLE COMPRESS. We want it to be=20
impossible or
    unlikely that an inserter or deleter can start executing=20
with a
    conglomerate that's about to go away.

    Another possibility would be for the store to provide a way=20
for
    the new conglomerate to have the same conglomerate id as=20
the
    old conglomerate. The store would also have to take care of=20
any
    open conglomerate controllers and scans that used the old
    conglomerate. I don't know the store well enough to say how=20
hard
    this would be, but I'm guessing it would be very hard.

Person B  then replies:

    This would be very hard for store.  In all these cases of=20
swapping out the
    container and conglomerate the id is the unit of recovery=20
and using the
    "same" id for something that may have to be recovered is=20
hard.

    Also the same type of problem can come about if an index=20
exists on a
    table, and then is dropped.  If the plan tries to use the=20
index after it
    has been dropped there is nothing the store can do in that=20
case.

    moving the invalidation up seems like a good idea, but as=20
Person C points
    out it doesn't solve it if there is any time when another=20
thread can
    validate it's plan and then start executing, block on a=20
lock and when
    it wakes up find the plan is invalid.


And i made the change to move the invalidate before we start=20
moving rows
from old to the new table.

This helps the test, but does not solve the real problem.


Hope this helps.


> Conglomerate does not exist after inserting large data  volume
> --------------------------------------------------------------
>
>                 Key: DERBY-637
>                 URL: https://issues.apache.org/jira/browse/DERBY-637
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.1.6
>         Environment: Solaris 10 Sparc
> Sun 1.5 VM
> Client/server DB
> 1 GB page cache
> JVM heap on server: min 1 GB, max 3 GB=20
>            Reporter: =C3=98ystein Gr=C3=B8vlen
>         Attachments: noContainerBug.java
>
>
> In a client/server environment I did as follows:
> 1. Started server
> 2. Dropped existing TPC-B tables and created new ones
> 3. Inserted data for 200 million accounts (30 GB account table)
> 4. When insertion was finished, tried to run a TPC-B transaction on same =
connection and was informed that conglomerate does not exist.  (See stack t=
race below).
> 5. Stopped client, started a new client to run a TPC-B transaction, got s=
ame error
> 6. Restarted server
> 7. Ran client again, and everything worked fine.
> Stack trace from derby.log:
> 2005-10-19 18:47:41.838 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550=
4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),=
 (DRDAID =3D NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.839 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550=
4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),=
 (DRDAID =3D NF000001.OB77-578992897558106193{1}), Failed Statement is: UPD=
ATE accounts SET abal =3D abal + ? WHERE aid =3D ? AND bid =3D ?
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> =09at org.apache.derby.iapi.error.StandardException.newException(Standard=
Exception.java:311)
> =09at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.rea=
dConglomerate(HeapConglomerateFactory.java:224)
> =09at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFin=
d(RAMAccessManager.java:486)
> =09at org.apache.derby.impl.store.access.RAMTransaction.findExistingCongl=
omerate(RAMTransaction.java:389)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(=
RAMTransaction.java:1315)
> =09at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(=
B2IForwardScan.java:237)
> =09at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.jav=
a:750)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans=
action.java:530)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans=
action.java:1582)
> =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorV=
iaIndex(DataDictionaryImpl.java:7218)
> =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescri=
ptor(DataDictionaryImpl.java:5697)
> =09at org.apache.derby.impl.sql.compile.QueryTreeNode.resolveTableToSynon=
ym(QueryTreeNode.java:1510)
> =09at org.apache.derby.impl.sql.compile.UpdateNode.bind(UpdateNode.java:2=
07)
> =09at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatem=
ent.java:333)
> =09at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement=
.java:107)
> =09at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.pre=
pareInternalStatement(GenericLanguageConnectionContext.java:704)
> =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPrepa=
redStatement.java:118)
> =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(EmbedPre=
paredStatement20.java:82)
> =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(EmbedPre=
paredStatement30.java:62)
> =09at org.apache.derby.jdbc.Driver30.newEmbedPreparedStatement(Driver30.j=
ava:92)
> =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedCo=
nnection.java:678)
> =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedCo=
nnection.java:575)
> =09at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> =09at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp=
l.java:39)
> =09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc=
essorImpl.java:25)
> =09at java.lang.reflect.Method.invoke(Method.java:585)
> =09at org.apache.derby.impl.drda.DRDAStatement.prepareStatementJDBC3(DRDA=
Statement.java:1497)
> =09at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java=
:486)
> =09at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatem=
ent.java:444)
> =09at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnTh=
read.java:3132)
> =09at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnT=
hread.java:673)
> =09at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:2=
14)
> Cleanup action completed
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550=
4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),=
 (DRDAID =3D NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID =3D 7550=
4654), (SESSIONID =3D 0), (DATABASE =3D /export/home3/tmp/oysteing/tpcbdb),=
 (DRDAID =3D NF000001.OB77-578992897558106193{1}), Failed Statement is: cal=
l SYSIBM.SQLCAMESSAGE(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> =09at org.apache.derby.iapi.error.StandardException.newException(Standard=
Exception.java:311)
> =09at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.rea=
dConglomerate(HeapConglomerateFactory.java:224)
> =09at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFin=
d(RAMAccessManager.java:486)
> =09at org.apache.derby.impl.store.access.RAMTransaction.findExistingCongl=
omerate(RAMTransaction.java:389)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(=
RAMTransaction.java:1315)
> =09at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(=
B2IForwardScan.java:237)
> =09at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.jav=
a:750)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans=
action.java:530)
> =09at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTrans=
action.java:1582)
> =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorV=
iaIndex(DataDictionaryImpl.java:7218)
> =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescri=
ptor(DataDictionaryImpl.java:5697)
> =09at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getRoutineList=
(DataDictionaryImpl.java:5766)
> =09at org.apache.derby.impl.sql.compile.StaticMethodCallNode.resolveRouti=
ne(StaticMethodCallNode.java:303)
> =09at org.apache.derby.impl.sql.compile.StaticMethodCallNode.bindExpressi=
on(StaticMethodCallNode.java:192)
> =09at org.apache.derby.impl.sql.compile.JavaToSQLValueNode.bindExpression=
(JavaToSQLValueNode.java:250)
> =09at org.apache.derby.impl.sql.compile.CallStatementNode.bind(CallStatem=
entNode.java:177)
> =09at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatem=
ent.java:333)
> =09at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement=
.java:107)
> =09at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.pre=
pareInternalStatement(GenericLanguageConnectionContext.java:704)
> =09at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPrepa=
redStatement.java:118)
> =09at org.apache.derby.impl.jdbc.EmbedCallableStatement.<init>(EmbedCalla=
bleStatement.java:68)
> =09at org.apache.derby.impl.jdbc.EmbedCallableStatement20.<init>(EmbedCal=
lableStatement20.java:78)
> =09at org.apache.derby.impl.jdbc.EmbedCallableStatement30.<init>(EmbedCal=
lableStatement30.java:60)
> =09at org.apache.derby.jdbc.Driver30.newEmbedCallableStatement(Driver30.j=
ava:115)
> =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnect=
ion.java:771)
> =09at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnect=
ion.java:719)
> =09at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java=
:475)
> =09at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatem=
ent.java:444)
> =09at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnTh=
read.java:3132)
> =09at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnT=
hread.java:673)
> =09at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:2=
14)
> Cleanup action completed

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.