db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <derby-...@db.apache.org>
Subject [jira] Resolved: (DERBY-670) improve space reclamation from deleted blob/clob columns which are bigger than a page
Date Tue, 07 Mar 2006 01:13:29 GMT
     [ http://issues.apache.org/jira/browse/DERBY-670?page=all ]
Mike Matrigali resolved DERBY-670:

    Resolution: Fixed

this patch just uses the exising row and column header information to schedule deleted rows
to be 
post commit reclaimed immediately  if the row is long or has a long column.  The overhead
was not
much to just check the in memory part of the row on the current cached page.  

committed :
m2_142:5>svn commit

Sending        java\engine\org\apache\derby\iapi\store\raw\Page.java
Sending        java\engine\org\apache\derby\impl\store\access\conglomerate\Gener
Sending        java\engine\org\apache\derby\impl\store\raw\data\BasePage.java
Sending        java\engine\org\apache\derby\impl\store\raw\data\StoredRecordHead
Adding         java\testing\org\apache\derbyTesting\functionTests\master\st_recl
Sending        java\testing\org\apache\derbyTesting\functionTests\suites\storete
Sending        java\testing\org\apache\derbyTesting\functionTests\tests\store\Ba
Sending        java\testing\org\apache\derbyTesting\functionTests\tests\store\On
Adding         java\testing\org\apache\derbyTesting\functionTests\tests\storetes
Transmitting file data .........
Committed revision 383663.

> improve space reclamation from deleted blob/clob columns which are bigger than a page
> -------------------------------------------------------------------------------------
>          Key: DERBY-670
>          URL: http://issues.apache.org/jira/browse/DERBY-670
>      Project: Derby
>         Type: Improvement
>   Components: Store
>     Versions:
>     Reporter: Mike Matrigali
>     Assignee: Mike Matrigali
>     Priority: Minor

> Currently Derby space reclamation is initiated after all the rows on a 
> MAIN page are delted.  When blob/clob's larger than a page are involved
> the row on the main page only keeps a pointer to a page chain, so the
> main page rows can be very small and thus may take a lot of rows to be
> deleted before we clean up and reuse space associated with blob/clob.
> So in an extreme case of a table with only a int key and a 1 blob column
> with N bytes , and a 32k 
> page derby probably stores more than 1000 rows.  If the app simply does
> insert/delete of a single row it will grow to 1000 * N bytes
> for an app that to the user should only be on the order of N big.
> It would seem reasonable to queue a post commit for any delete which
> includes a blob/clob that has been chained.  This is in keeping with
> the current policy to queue the work when it is possible we can reclaim
> an entire page.  
> The problem is that there would be an extra cost at delete time to 
> determine if the row being deleted has a blob/clob page chain.  The
> actual information is stored in the field header of that particular
> column so currently the only way to check would be to check every
> field header of every column in the deleted row.  From the store's
> point of view every column can be "long" with a page chain -- currently
> it doesn't know that only blob/clob datatypes can cause this behavior.
> Some options include:
> 1 at table create time ask for input from language to say if one of
>   these is at all possible, so that check is never done if not 
>   necessary.
> 2 Maintain a bit in the container header with some sort of indication if any long
>     row exists, may simply 1/0 or a reference count.   Note information is easily
>      available at insert time.
> 3 maintain a bit in the page indicating if any long rows exist
> 4 maintain a bit in the record header if any long columns exist,  note the existing bit
>     is only if the whole record is overflowed, not if a single column is overflowed.
> options 1-3 would then be used to only perform the slow check at delete time if  necessary.
> I don't really like option 1 unless we change the storage interface to actually check/guarantee
the behavior.
> I lean toward option 4, but it is sort of a row format change.  Given that the system
has room saved for this
> bit I believe we can use it without any sort of upgrade-time work necessary - though
I believe it can only be set on a
> hard upgrade as there may be old code which does not expect it to be set.  Soft upgrades
won't get the
> benefit and existing data won't get the benefit.
> Any other ideas out there?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message