db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "DataDictionaryCaching" by BryanPendleton
Date Sat, 02 Sep 2006 15:56:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by BryanPendleton:
http://wiki.apache.org/db-derby/DataDictionaryCaching

The comment on the change is:
Some high level notes on DD caching

New page:
The DataDictionary implementation contains several major caches of descriptors:
 * {{{nameTdCache}}} caches {{{TableDescriptors}}} and can find one by name
 * {{{OIDTdCache}}} caches {{{TableDescriptors}}} and can find one by UUID
 * {{{spsNameCache}}} caches Stored Prepared Statements and can find one by name
 * {{{permissionsCache}}} caches Permissions and can find one by {{{PermissionDescriptor}}}

Since a {{{TableDescriptor}}} object is the root of a tree of objects describing that table
(its columns, its constraints, its triggers, its conglomerates), caching the {{{TableDescriptor}}}
also implicitly caches the table's {{{ColumnDescriptors}}}, its {{{ConstraintDescriptors}}},
its {{{TriggerDescriptors}}} and so forth.

Caching is crucial to DataDictionary performance; otherwise we would constantly need to be
reading metadata from the SystemTables on disk. Sharing one copy of the DataDictionary information
in memory among many users also reduces memory footprint. So the DataDictionary tries very
hard to read the SystemTables information into memory as rarely as possible, and tries to
hold it in memory as long as possible.

There are two reasons why the DataDictionary can't always do this:
 * The caches are limited in size, and so the cache manager may not be able to keep all the
tables in memory
 * The SystemTables information is not static: applications may dynamically change the database
schema, by defining new schema objects (tables, views, triggers, constraints, etc.), or by
dropping or modifying existing schema objects.

When the database schema is modified, the DataDictionary uses a very simple mechanism: it
empties the caches and starts over. It does not make any special efforts to determine which
cached information has become invalid, but instead just removes it all.

LanguageSystem code which accesses the DataDictionary has to follow the reading/writing protocol
in order to ensure the correct operation of the caches. This protocol involves calling {{{startReading}}}
/ {{{doneReading}}} when reading information from the DataDictionary, and calling {{{startWriting}}}
/ {{{doneWriting}}} when updating the database schema.

For example, {{{CreateTableConstantAction}} calls {{{startWriting}}} when it is creating a
new table in the database. It then generates a new {{{TableDescriptor}}} for the new table
and calls {{{addDescriptor}}} to add the information about the table to the SystemTables.

In general, the DataDictionary caching mechanism is trouble-free and efficient. However, at
times it may be useful to understand its operation, both for performance reasons and for debugging
reasons.

For performance analysis, the DataDictionary cache has the following properties:
 * it consumes a certain amount of memory
 * accessing SystemTables information from the cache is vastly more efficient than reading
it from the real tables.
 * DDL statements cause the cache(s) to be flushed
 * a DDL statement in a transaction effectively disables the cache for *all* users until the
transaction commits

For debugging, consider the following example: DERBY-1724 is an interesting case of a situation
in which DataDictionary caching plays a role. DERBY-1724 was a manifestation of DERBY-1583,
which was an underlying bug involving an incorrect assumption about the {{{ColumnDescriptor}}}
object. A {{{ColumnDescriptor}}} object may or may not have an internal pointer to a corresponding
{{{TableDescriptor}}} object. When a {{{ColumnDescriptor}}} is first created by {{{SYSCOLUMNSRowFactory}}},
it does not have a {{{TableDescriptor}}} pointer. This is because not all {{{ColumnDescriptor}}}
objects are necessarily tied to particular tables; some may be expressions computed at runtime,
for instance. At the point where {{{FromBaseTable}}} determines that a particular {{{ColumnDescriptor}}}
is definitely tied to a particular {{{TableDescriptor}}}, it sets the table descriptor pointer
in that {{{ColumnDescriptor}}}. Since {{{ColumnDescriptor}}} objects are cached, this updated
object remain
 s in memory for subsequent use. This means that code which uses the {{{ColumnDescriptor}}}
may or may not find that the table descriptor pointer has already been set, depending on whether
or not the cache has managed to retain the descriptor in memory since the pointer was set.
And, to close the chain of logic, the DERBY-1724 bug script contains a DDL statement (GRANT)
in a transaction, which causes the cache to be disabled and thus enables the conditions for
the DERBY-1583 bug to be triggered.

Mime
View raw message