db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-2338) improve page allocation when there are many threaded inserts to same table .
Date Wed, 02 Jun 2010 12:29:38 GMT

     [ https://issues.apache.org/jira/browse/DERBY-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kristian Waagan updated DERBY-2338:
-----------------------------------

    Affects Version/s: 10.7.0.0

> improve page allocation when there are many threaded inserts to same table .
> ----------------------------------------------------------------------------
>
>                 Key: DERBY-2338
>                 URL: https://issues.apache.org/jira/browse/DERBY-2338
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.7.0.0
>            Reporter: Mike Matrigali
>
> The derby strategy for picking the page to insert a new row on  could be tuned to act
better in a multi-user insert to same table environment.
> The algoriithm currently has the following information to use:
> 1) the number of the last page an insert was done on
> 2) a bit map indicating all the completely empty, but allocated pages
> 3) a bit map indicating all the allocated pages which are "half-filled", where half filled
is very loosely defined.  Basically pages with some
>      rows on it, where a previous insert has not failed because it was too big.
> 4)  2 pointers to where we last left off in a linear search of half filled pages and
free pages.
> Derby  chooses to optimize inserts for future select performance by trying to fit entire
rows on pages.  To this end it uses a 3 try method
> when picking what page to insert a new row on (this is for base tables, referred internally
as heaps).  Not currently when the insert is attempted
> the store does not know how long the row is until after it picks the page to insert on,
and streams it into a log record.
> 1) try to insert entire row on last page an insert was attempted.  If it doesn't fit
move on to next step.
> 2) try to insert entire row onto a "half-filled page", if it doesn't fit move on to next
step.
> 3) get empty page and insert row and overflow any part that does not fit.
> There are a number of optimizations that could be done (if anyone chooses to work on
one it may be better to log a separate jira to track that
> specific project):
> 1) in memory keep track of more than one last page.  In a multi-user environment it may
be better if one got a latch wait on the picked page to
>      try a different page and keep a "group/queue" of last pages - maybe sorted by space
available.  It would be good if such code was zero
> admin in that it could configure itself based on the dynamic concurrency it recognized.
 The current algorithm works pretty well until there are
> concurrent inserts by multiple threads into the same one table at the same time (problems
probably start showing up with either dual core or
> real multi-processor).  Note that  in the insert algorithm described above I believe
there is a problem if many threads hit step 1 and each find
> they can't insert on page 100 but then all choose some different page as part of step
2 and/or 3.  Especially if  there are no unfilled pages each
> may allocate a new page and only the last one to do the insert will remember that as
the "last page" and all subsequent inserts would start at
> that last page.
> 2) For step 2 and 3 one may know the entire size of the row , or at the very least the
minimum size of the row.  This info could be used to better
>     pick a candidate page.
> 3) unfilled page tracking is very limited given only 1 bit per page in the allocation
map - can't really tell difference between 1 byte left and page-1 byte left.  There are a
couple of options.  One could expand the on
> disk allocation maps.  The downside is more disk overhead, and an upgrade issue.  One
could also just do a better job of maintaining in memory
> information in the alloc cache as one reads pages from disk and avoid the on disk changes.
 Just keeping a  queue of recently seen unfilled pages
> inverse by space available might be a big improvement.  The queue need not be the actual
pages, just a page number/space available.  The info
> could be treated as a hint so could avoid extra latching/concurrency issues.
> Please attach any other ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message