db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "Derby3192Writeup" by DyreTjeldvoll
Date Sun, 20 Jan 2008 18:17:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by DyreTjeldvoll:
http://wiki.apache.org/db-derby/Derby3192Writeup

New page:
#pragma section-numbers 2
= Changes introduced in DERBY-3198 =

== Goal ==

The patch implements caching of the isolation level in the client
driver. The caching lets the client driver execute
`getTransactionIsolation()` without a round-trip to the server. It would
also let the `setTransactionIsolation()` be "short-circuited" (and avoid
a round-trip), in the case where the new isolation level is equal to
that cached in the client. The latter optimization is not part of this
patch since it would mean that `setTransactionIsolation()` no longer
would be guaranteed to commit the current transaction. Including it
would have made the client driver behave the same wasy as the embedded
driver in this respect, but unfortunately it would also mean that a
number of tests expecting the old behavior would fail.


== Motivation ==

One would ordinaryily not expect the performance of
getTransactionIsolation() to be critical. However, the use of
connection pools (commonly found in application server environments),
invalidates this assumption. 

A connection pool works by keeping a limitied number of (open)
connections in a pool and letting application (threads) share the
connections in the pool. The JDBC spec mandates that the connection
pool must make sure that all attributes of a connection obtained from
a connection pool must have the same default values as a connection
obatined from a call to `DriverManager.getConnection()` (or
`DataSource.getConnection()`). 

In the case of the isolation level the connection pool manager can
either unconditionally set the isolation level back to its default
value when a connection is returned to the pool (alternatively just
before it is returned from the pool), or it can query the connection
for its isolation level setting and just change it if it is different
from the default. In either case, a driver that doesn't cache the
isolation level will be forced to make an additional round-trip to the
server each time a connection enters (or leaves) the connection pool.


== The Stale Cache Problem ==

Caching the isolation level in the context of JDBC is simple, and
prior to [http://issues.apache.org/jira/browse/DERBY-1148 DERBY-1148]
the client driver would cache the isolation level and only update its
cache whenever setTransactionIsolation() was called. As explained in
[http://issues.apache.org/jira/browse/DERBY-1148 DERBY-1148] this is
not sufficient, as the isolation level can change without
setTransactionIsolation being called (XA, SQL, stored procedures, and
functions). In all of theses cases the cached isolation level becomes
stale (different from the actual isolation level in the embedded
connection used by the NetworkServer.


== Solutions to the Stale Cache Problem ==

One approach to solving the problem would be to scan all activity on
the connection (including statements and prepared statements) to
detect if a new isolation level is being set. This would be able to
detect changes done in SQL, but not those coming from XA state
changes, or those happening in stored procedures. A defensive approach
would be to assume that any stored procedure or XA stat change may
have changed the isolation level so that a refresh from the server is
required. This approach seems to limiting to be truly useful (my
opinion).

The approach that was chosen in this patch was to piggy-back all
changes to the isolation level on the first message going from the
server to client after the change has happened. In the current patch
this approach is combined with the defensive approach mentioned above
by allowing the isolation level to be set to an unknown value which will
trigger a refresh from the server if no piggy-backed
info has reached the client. The defensive approach is used in the XA
attach/detach situation.


=== Implementing Piggy-backing === 

The simplest and least intrusive way of implementing piggy-backing
would be to create a call-back mechanism that would let
EmbedConnection (through the EngineConnection interface) inform its
`DRDAConnThread` that some information that the client is interested
in (in this case the isolation level) has changed. The
`DRDAConnThread` could then append this information to the next
message being sent to the client. There are two problems with this
approach that caused it not to be chosen for this patch.

 1. AFAICT DRDA does not allow the server to add replies to a message
that do not correspond to a request from the client.

 1. The server would need some way of verifying that the client is
capable of interpreting the piggy-backed information.

Another approach is to let the client "poll" the server for cachable
values that have changed. This can be implemented using the
request-reply model of DRDA, and the server would only piggy-back
information to clients that actually requests it, so there is no need
for a hand-shake protocol to find out what the client can handle. (The
client would still need to make sure that it only sends these requests
to servers that can handle them, but this is simply a test on the
servers version number, which is done for things like locators
already). 

The disadvantage of this approach is that polling (as always)
introduces more overhead. Each message going upstream needs to contain
the poll request, and each message going downstream (to the client)
needs to contain the (possibly empty) reply. Each time a poll request
is processed on the server it needs to check if the cachable
information has changed since last poll. This approach was still
chosen in this patch because the first approach seemed to be
incompatible with DRDA.


=== Piggy-backing in DRDA ===

DRDA provedes the EXCSQLSET command to "batch" (for lack of a better
term) SET statements from a client to the server. This command is
currently used to implement the `setQueryTimeout()` functionality. Patch
v1 also uses this DRDA command to implement polling for a changed
isolation level. The client code provides a method
(`Statement.writeSetSpecialRegister()`) for adding an EXCSQLSET command
to a request. This is really a familiy of methods in various classes
and interfaces with the most action concentrated in
`Statement.wrteSetSpecialRegister` and
`NetStatementRequest.writeSetSpecialRegister()`. A large number of
changed lines in the patch can be attributed to the fact that the
signature for this family of methods needed to be changed, (as well as
the fact that javadoc was added for these methods, including the
interface deaclarations).

`setQueryTimeout()` implements its functionality by letting this command
send a special statement which also encodes the timeout value to
set. Patch v1 uses the same idea but uses a different dummy
statement which, just like in the setQueryTimeout case, is intercepted
by the server and given special treatment.

The patch adds a new method to `EmbedConnection` and also to the
`EngineConnection` interface (and thereby also to the BrokeredConnection
interface), that lets the caller iterate over the connection
attributes that have changed since last time this method was called,
(in patch v1 only the isolation level will ever be reported by this
method). 

When the server receives the polling request it always adds
the SQLCARD required by EXCSQLSET to the reply. If a cacheable connection
attribute has changed it is added to the reply in the form of
an SQLSTT codepoint containing a string with the following format:
'key=value', where 'key' identifies the attribute that has changed,
and 'value' is its new value.

There is a `readSetSpecialRegister()` method (corresponding to
`writeSetSpecialRegister()`), which will read the reply on the client
side, and invoke a callback (`completePiggyBackSessionData()`) on the
`am.Connection` object through
`ConnectionCallbackInterface.completePiggyBackSessionData()` will
update the relevant attributes (currently only
`am.Connection.isolation_`).

==== Use of the RTNSETSTT instance variable ====

The DRDA spec specifies (REF) that EXCSQLSET can have an optional
instance variable RTNSETSTT, that essentially is a boolean flag which
allows the requester (the client) to tell the server that it expects
(and can handle) SQLSTT codepoints in the reply. Since v1 is using
such SQLSTT codepoints to transmit the cached attributes "diff", it
also needs to set the RTNSETSTT instance variable when sending the
request. Note that this is essentially "redundant" information since
the server knows exactly what to do as soon as it has identified the
special "statement" used in the polling request. It is only added to
conform to the DRDA spec. 

One could also have omitted the use of RTNSETSTT and rather encoded
the connection attributes in the mandatory SQLCARD. The SQLCARD is
required but will never be used, since the polling request will not
actually execute a SET statement that can fail. 


=== Overhead Introduced by Piggy-backing ===

As mentioned previously the polling approach will necessarily
introduce some overhead both on requests and replies. 

==== Request ====

One EXCSQLSET command is added to every message containing OPNQRY and
CNTQRY commands. Most of the overhead seems to come
from the PKGNAMCSN instance variable which by itself uses 76 bytes, or
59% of the total overhead:

|| DRDA VARIABLE || SIZE (BYTES) || 
DSS header 6
  EXCSQLSET 4
    PKGNAMCSN 76
    RTNSETSTT 4
      HEXSTRDR 5
    SQLSTT 4 + 1 + 4 + 23(strlen) + 1
|| TOTAL || 128 ||
 
This gives a total overhead of (a whopping) 128 bytes. 
It can be reduced by shortening the name used for the "special
statement", but the overhead would still be at least 105 bytes. 

If we also omit the RTNSETSTT instance variable the overhead could be 
further reduced to 96 bytes per request. Which is still substantial.


==== Reply ====

The only mandatory variable in an EXCSQLSET reply is SQLCARD. When
used to poll for modified session data, this variable has no real
meaning, and null is passed in as the SQLException parameter to
`DDMWriter.writeSQLCARD()`. For this patch is was (incorrectly) assumed
that passing null would result in a minimal increase of the
message size. In reality, the DSS size (as reported by
`getDSSLength()`) increases to 65 after adding the SQLCARD (which creates
its own DSS):

|| DRDA VARIABLE || SIZE (BYTES) ||
DSS header 6
  SQLCARD 65
  SQLSTT 4 + 1 + 4 + (4+1)(strlen) + 1 (15)
  ...
|| TOTAL || 71/86 ||

When no session parameter has changed (the common case) only the 71
bytes of the DSS header and the SQLCARD will be added to the reply. If
one or more session parameters have changed, one SQLSTT object will be
added for each changed paramter. The size of the SQLSTT obviously depends on the
value of the parameter, but since SQLSTTs are strings the byte count
will also depend on whether single-byte or multi-byte encoding is used
for the string representing the parameter value. 


== Multiple Round-trip Replies ==

Following the example of `setQueryTimeout()`, calls to
`writeSetSpecialRegister()` and `readSetSpecialRegister()` are made
inside `Statement.flowExecute()`. The request is added in the first
part of the method, prior to calling `...flow()`, and the reply is
read in the latter part of the method, after the blocking call to
`...flow()` returns.

In the case of `setQueryTimeout()`, the request is added to the
beginning of the outgoing message (which is reasonable to ensure that
all the following actions observer the time limit), and the subsequent
reply is the first received and parsed after `...flow()` returns. 

When polling for changed session data the situation is the exact
oposite; the poll request needs to be the last thing added to the
outgoing message to ensure that the poll picks up any changes in the
session state that has happened as a consequence of the other commands
in this message. As a result, the reply must be received and parsed at
the end of `...flow()`, after all the other replies have been parsed.


=== Scrollable Result Sets ===

Most of the time this works well because call to `...flow()` is
followed by a single reply, but for scrollable
result sets this is not the case. If `Statement.isQuery__` is
true and the `Statement.resultSet_' is not null, a call to
`parseScrollableRowset()` is made:

{{{#!java
                if (resultSet_ != null) {
                    resultSet_.parseScrollableRowset();
}}}

This happens irrespective of the type of the result set, but the call
to `parseRowset_()` inside `am.ResultSet.parseScrollableRowset()` is
only made if `scrollable_` is true:

{{{#!java
if (cursor_.dataBufferHasUnprocessedData() && scrollable_) {
            parseRowset_();
}}}

`net.NetResultSet.parseRowSet_()` _may_ call
`flowFetchToCompleteRowset()` directly, or through
`calculateColumnOffsetsForRow_()`, and this call initiates another
roundtrip, while still inside `Statement.flowExecute()`:

{{{#!java
void flowFetchToCompleteRowset() throws DisconnectException {
        try {

            agent_.beginWriteChain(statement_);

            writeScrollableFetch_((generatedSection_ == null) ? statement_.section_ : generatedSection_,
                    fetchSize_ - rowsReceivedInCurrentRowset_,
                    scrollOrientation_relative__,
                    1,
                    false);  // false means do not disard pending
            // partial row and pending query blocks

            agent_.flow(statement_);
            readScrollableFetch_();
            agent_.endReadChain();
}}}


This is problematic because:

 * The QRYDTA parsing does not expect anything to follow the last
 QRYDTA reply in a message, so if all QRYDTAs have not been received,
 it will choke when reading the poll reply.
 * The new round-trip (CNTQRY) initiated by the parsing code will not
 add a new session data poll request, so even if the poll reply
 is handled or ignored by the QRYDTA parsing, there will be no
 poll reply to read when QRYDATA parsing is complete and flow control
 reaches `readSetSpecialRegister()` in 
 `Statement.flowExecute()`, and the call will block forever.

The problem is limited to *read only* result sets since *updatable*
result sets always have a fetchsize equal to 1, regardless of what the
suggested fetchsize is.


=== Possible Solutions ===

The initial idea was to handle this by trying to detect this situation
on the server, and not send a session data reply until all calls to
`flowFetchToCompleteRowset()` had been completed, so that the
session data was handled when control returned to
`Statement.flowExecute()`.

This approach does not work correctly when `fetchsize_` is smaller
than the number of CNTQRYs needed to get a complete row set, because
then the loop calling `flowFetchToCompleteRowset()` will terminate
before the server sends the session data, and then the code
inside `Statement.flowExecute()` trying to read the information will
hang waiting for data which never arrives.

At this point the best solutions seem to be 

 1. Add the piggy-back request command to every CNTQRY and parse it in
every reply (also those handled by `flowFetchToCompleteRowSet()`).

 1. Implement some kind of bookkeeping mechanism that tracks if a
command was included in the last request, and if the return value has
been handled in the latest reply. This is more complicated but reduces
the number of unnecessary piggy-back commands.

Mime
View raw message