Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Message-ID: <213780334.1233947039720.JavaMail.jira@brutus>
Date: Fri, 6 Feb 2009 11:03:59 -0800 (PST)
From: "Mike Matrigali (JIRA)" <jira@apache.org>
To: derby-dev@db.apache.org
Subject: [jira] Updated: (DERBY-4050) Multithreaded clob update causes
 growth in table that does not get reclaimed
In-Reply-To: <1476477017.1233881639670.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/DERBY-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-4050:
----------------------------------


in the simplest case a row with a single long clob looks something like
this:
main page 1 (access level reads this row):
slot 0: long column just as short pointer to list of overflow page
        ---> page 2 -> page 3 -> ...

When an update of a long row happens we basically change slot 0 row's column
to point somewhere else but leave the old chain disconnected:
slot 0: long column:
       --->page 100 ->page 101 -> ....

We can't mark the old chain free until commit, since we need to guarantee space in case
of an abort.  But if we lose the post commit for any reason then the chain is lost, we dont
have any back pointers.  So it is very bad to lose this kind of space reclamation.  In the
normal deleted head row, we can always re-walk the pages and find the deleted rows again
in future.
         

Access is in charge of posting post commit events for page 1, but it is up
to raw store layer to handle freeing the overflow pages (here 2 and onward).  I believe this
work is handles by the ReclaimSpace class in
java/engine/org/apache/derby/impl/store/raw/data
probably the one commented as "reclaim column chain"

The real work I believe is done in ReclaimSpaceHelper.java, same directory.

My first guess would be that somehow the concurrency is causing the
reclaim space to fail, probably on a latch/latch conflict and the code
does not handle it.  Depending on what the error is we should probably
wait and/or retry, or maybe requeue the whole post commit.  I suggest some
prints to the log in this routine and compare the working vs non-working
test cases.  It looks like there are some traces you can enable first to
see if it gives any good info:

if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace))
{
        SanityManager.DEBUG(
                DaemonService.DaemonTrace, " aborted " + work +
                        " because container is locked or dropped");
}


The following code looks like it could be the problem:
if (!container_rlock.lockRecordForWrite(
tran, headRecord, false /* not insert */, false /* nowait */))
{
    // cannot get the row lock, retry
    tran.abort();
    if (work.incrAttempts() < 3)
        return Serviceable.REQUEUE;
    else
        return Serviceable.DONE;
}

If the above is the problem, I would be interested if either of these
fix the problem:
1) change the nowait flag to wait (not a great thing to do in a post commit
   background thread).
2) get rid of the 3 and see if it eventually succeeds.  For this test case
   and multiprocessor concurrency it may never get in.

> Multithreaded clob update causes growth in table that does not get reclaimed
> ----------------------------------------------------------------------------
>
>                 Key: DERBY-4050
>                 URL: https://issues.apache.org/jira/browse/DERBY-4050
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.0.0
>            Reporter: Kathey Marsden
>         Attachments: ClobGrowth.java
>
>
> Doing a multithreaded update of a Clob table causes table growth that does not get reclaimed except by compressing the table.  The reproduction has a table with two threads. One  thread  updates row 1 repeatedly with 33,000 character clob. The other thread updates row 2 with a small clob, "hello".  The problem occurs back to 10.2 but seems much worse on trunk than 10.2.   The trunk database grew to 273MB on trunk after 10000 updates of each row. The 10.2 database grew only to 25MB.  If the update is synchronized there is no growth.
> I will attach the repro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.