db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Created: (DERBY-2338) improve page allocation when there are many threaded inserts to same table .
Date Wed, 14 Feb 2007 23:51:07 GMT
improve page allocation when there are many threaded inserts to same table .

                 Key: DERBY-2338
                 URL: https://issues.apache.org/jira/browse/DERBY-2338
             Project: Derby
          Issue Type: Improvement
          Components: Store
    Affects Versions:
            Reporter: Mike Matrigali

The derby strategy for picking the page to insert a new row on  could be tuned to act better
in a multi-user insert to same table environment.
The algoriithm currently has the following information to use:
1) the number of the last page an insert was done on
2) a bit map indicating all the completely empty, but allocated pages
3) a bit map indicating all the allocated pages which are "half-filled", where half filled
is very loosely defined.  Basically pages with some
     rows on it, where a previous insert has not failed because it was too big.
4)  2 pointers to where we last left off in a linear search of half filled pages and free

Derby  chooses to optimize inserts for future select performance by trying to fit entire rows
on pages.  To this end it uses a 3 try method
when picking what page to insert a new row on (this is for base tables, referred internally
as heaps).  Not currently when the insert is attempted
the store does not know how long the row is until after it picks the page to insert on, and
streams it into a log record.
1) try to insert entire row on last page an insert was attempted.  If it doesn't fit move
on to next step.
2) try to insert entire row onto a "half-filled page", if it doesn't fit move on to next step.
3) get empty page and insert row and overflow any part that does not fit.

There are a number of optimizations that could be done (if anyone chooses to work on one it
may be better to log a separate jira to track that
specific project):
1) in memory keep track of more than one last page.  In a multi-user environment it may be
better if one got a latch wait on the picked page to
     try a different page and keep a "group/queue" of last pages - maybe sorted by space available.
 It would be good if such code was zero
admin in that it could configure itself based on the dynamic concurrency it recognized.  The
current algorithm works pretty well until there are
concurrent inserts by multiple threads into the same one table at the same time (problems
probably start showing up with either dual core or
real multi-processor).  Note that  in the insert algorithm described above I believe there
is a problem if many threads hit step 1 and each find
they can't insert on page 100 but then all choose some different page as part of step 2 and/or
3.  Especially if  there are no unfilled pages each
may allocate a new page and only the last one to do the insert will remember that as the "last
page" and all subsequent inserts would start at
that last page.
2) For step 2 and 3 one may know the entire size of the row , or at the very least the minimum
size of the row.  This info could be used to better
    pick a candidate page.
3) unfilled page tracking is very limited given only 1 bit per page in the allocation map
- can't really tell difference between 1 byte left and page-1 byte left.  There are a couple
of options.  One could expand the on
disk allocation maps.  The downside is more disk overhead, and an upgrade issue.  One could
also just do a better job of maintaining in memory
information in the alloc cache as one reads pages from disk and avoid the on disk changes.
 Just keeping a  queue of recently seen unfilled pages
inverse by space available might be a big improvement.  The queue need not be the actual pages,
just a page number/space available.  The info
could be treated as a hint so could avoid extra latching/concurrency issues.

Please attach any other ideas.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message