Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 29754 invoked from network); 6 Feb 2009 19:04:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Feb 2009 19:04:23 -0000 Received: (qmail 79359 invoked by uid 500); 6 Feb 2009 19:04:21 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 79278 invoked by uid 500); 6 Feb 2009 19:04:20 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 79255 invoked by uid 99); 6 Feb 2009 19:04:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Feb 2009 11:04:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Feb 2009 19:04:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B0337234C4B4 for ; Fri, 6 Feb 2009 11:03:59 -0800 (PST) Message-ID: <213780334.1233947039720.JavaMail.jira@brutus> Date: Fri, 6 Feb 2009 11:03:59 -0800 (PST) From: "Mike Matrigali (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Updated: (DERBY-4050) Multithreaded clob update causes growth in table that does not get reclaimed In-Reply-To: <1476477017.1233881639670.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Matrigali updated DERBY-4050: ---------------------------------- in the simplest case a row with a single long clob looks something like this: main page 1 (access level reads this row): slot 0: long column just as short pointer to list of overflow page ---> page 2 -> page 3 -> ... When an update of a long row happens we basically change slot 0 row's column to point somewhere else but leave the old chain disconnected: slot 0: long column: --->page 100 ->page 101 -> .... We can't mark the old chain free until commit, since we need to guarantee space in case of an abort. But if we lose the post commit for any reason then the chain is lost, we dont have any back pointers. So it is very bad to lose this kind of space reclamation. In the normal deleted head row, we can always re-walk the pages and find the deleted rows again in future. Access is in charge of posting post commit events for page 1, but it is up to raw store layer to handle freeing the overflow pages (here 2 and onward). I believe this work is handles by the ReclaimSpace class in java/engine/org/apache/derby/impl/store/raw/data probably the one commented as "reclaim column chain" The real work I believe is done in ReclaimSpaceHelper.java, same directory. My first guess would be that somehow the concurrency is causing the reclaim space to fail, probably on a latch/latch conflict and the code does not handle it. Depending on what the error is we should probably wait and/or retry, or maybe requeue the whole post commit. I suggest some prints to the log in this routine and compare the working vs non-working test cases. It looks like there are some traces you can enable first to see if it gives any good info: if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace)) { SanityManager.DEBUG( DaemonService.DaemonTrace, " aborted " + work + " because container is locked or dropped"); } The following code looks like it could be the problem: if (!container_rlock.lockRecordForWrite( tran, headRecord, false /* not insert */, false /* nowait */)) { // cannot get the row lock, retry tran.abort(); if (work.incrAttempts() < 3) return Serviceable.REQUEUE; else return Serviceable.DONE; } If the above is the problem, I would be interested if either of these fix the problem: 1) change the nowait flag to wait (not a great thing to do in a post commit background thread). 2) get rid of the 3 and see if it eventually succeeds. For this test case and multiprocessor concurrency it may never get in. > Multithreaded clob update causes growth in table that does not get reclaimed > ---------------------------------------------------------------------------- > > Key: DERBY-4050 > URL: https://issues.apache.org/jira/browse/DERBY-4050 > Project: Derby > Issue Type: Bug > Components: Store > Affects Versions: 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.0.0 > Reporter: Kathey Marsden > Attachments: ClobGrowth.java > > > Doing a multithreaded update of a Clob table causes table growth that does not get reclaimed except by compressing the table. The reproduction has a table with two threads. One thread updates row 1 repeatedly with 33,000 character clob. The other thread updates row 2 with a small clob, "hello". The problem occurs back to 10.2 but seems much worse on trunk than 10.2. The trunk database grew to 273MB on trunk after 10000 updates of each row. The 10.2 database grew only to 25MB. If the update is synchronized there is no growth. > I will attach the repro. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.