db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik" <Dag.Wan...@Sun.COM>
Subject Proposal for scrollable, updatable, insensitive result sets
Date Tue, 08 Nov 2005 17:36:47 GMT

Hi,

I just created a new JIRA (690) to add this feature for Derby and
uploaded a text file containing the details of the
proposal. Hopefully, you will find this feature could a valuable
addition to Derby. We (Fernanda and Andreas will help me with this
feature) would appreciate your views and comments on this proposal!

I enclose the text here for your convenience.

Thanks,
Dag

----------------------------------------------------------------------
	Proposal for scrollable, updatable result sets in Derby
			     Dag H Wanvik
			     revision 1.0

Table of contents

* Introduction
* Staging
* Rationale for supporting TYPE_SCROLL_INSENSITIVE 
* Specification for {TYPE_SCROLL_INSENSITIVE, CONCUR_UPDATABLE} in Derby
* Some implementation ideas
* Appendix I: Related JIRA issues

* Introduction

JDBC result sets are created with three properties: type, concurrency
and holdability. The type can be one of TYPE_FORWARD_ONLY,
TYPE_SCROLL_INSENSITIVE and TYPE_SCROLL_SENSITIVE. The concurrency can
be one of CONCUR_READ_ONLY and CONCUR_UPDATABLE. The holdability can
be one of HOLD_CURSORS_OVER_COMMIT and CLOSE_CURSORS_AT_COMMIT.

JDBC allows the full cross product of these. SQL 2003 prohibits the
combination {TYPE_SCROLL_INSENSITIVE, CONCUR_UPDATABLE}, but this
combination is supported by some vendors, notably Oracle.

Currently, Derby supports JDBC result sets in a limited
way. Holdability is supported. Furthermore, the following is
supported: 
	   - forward-only, read-only 
	   - forward-only, updatable (update, delete, but not insert)
	     Also, in the network driver, support for some data types
	     conversions is missing:

	     [updateString on SMALLINT, INTEGER, BIGINT, DECIMAL datatypes 
	      updateBytes on CHAR, VARCHAR, LONG VARCHAR datatypes 
	      updateTime on TIMESTAMP datatypes 
	      updateObject with null values 
	      updateClob and updateBlob]
            
	     Call it JIRA-y. 
	     See also appendix I containing a list of related JIRA issues.

	   - scroll insensitive, read-only

We (Fernanda and Andreas will cooperate with me on this) propose a
plan to add support for the combination:

	   - scroll insensitive, updatable

As a part of this we would also like to add the missing insert
operation to the {forward-only, updatable} result sets (JIRA-100), and
remove the requirement for an explicit "FOR UPDATE" clause in the SQL
query to achieve updatability if CONCUR_UPDATABLE is specified
(JIRA-231).

* Staging

We propose to break up the development in the following parts, not
necessarily in that order:

   - JIRA-231
   - JIRA-100 support in embedded driver
   - JIRA-100 support in network driver (minus JIRA-y)
   - JIRA-x [support for {TYPE_SCROLL_INSENSITIVE, CONCUR_UPDATABLE}]
     in embedded driver
   - JIRA-x in network driver
   - JIRA-y for both TYPE_FORWARD_ONLY and TYPE_SCROLL_INSENSITIVE.
   - JIRA-517
   - JIRA-611

More work is needed to determine which, if any, of the lob issues in
appendix I should/need be fixed as part of this effort. E.g. Do we
need locators?

* Rationale for supporting TYPE_SCROLL_INSENSITIVE 

The SQL standard calls for insensitive result sets to be read-only¹,
but JDBC is agnostic on this issue.

¹[SQL 2003, vol 2, section 14.1 <declare cursor>, syntax rules:
  :
  10) If <updatability clause>is not specified,then
      Case: a) If either INSENSITIVE,SCROLL,or ORDER BY is
               specified, or if QE is not a simply updatable table, then
               an <updatability clause> of READ ONLY is implicit.
            b) Otherwise, an <updatability clause> of FOR UPDATE without
	       a <column name list> is implicit.
  11) If an <updatability clause> of FOR UPDATE with or without a <column
      name list> is specified, then INSENSITIVE shall not be specified and
      QE shall be updatable.
 ]

Our preliminary investigations have shown that implementing the
combination {sensitive, updatable} is much harder, since the current
B-tree implementation does not allow backward scans. The current
read-only, scrollable result sets rely on a cache of the visited rows
to be stored in a disk backed hash table (the hash index being the row
number in the result set) for its implementation of navigating the
result set. Hence, the scan over the underlying tables (whether via an
index or not) can still be limited to forward scans. The hash table is
currently filled up as the result set is read via ResultSet#next(). A
call to ResultSet#absolute(-1) would force the entire result set to be
read and copied to the hash table, before the positioning on the final
row.

We think having updatable result sets, even if not SQL compliant,
would be a useful addition to Derby. If/when backwards scans of
B-trees is implemented (cf. discussions of moving in the direction in
the thread
http://comments.gmane.org/gmane.comp.apache.db.derby.devel/9358), we
could consider also adding sensitive updatable result sets, but this
is not part of this proposal.

We propose to extend the current scrollable implementation to support
updates, deletes and inserts. Since the semantics of {insensitive,
updatable} is not defined by SQL, we informally describe meaningful
semantics for it here.

* Specification for {TYPE_SCROLL_INSENSITIVE, CONCUR_UPDATABLE} in Derby

The semantics of result sets allow for some variation, viz. the size of
the chapter in the JDBC tutorial book about it. Metadata calls can be
used to inquire about the exact behavior of a particular
implementation. The following DatabaseMetadata methods determine the
behavior:

		supportsResultSetType(type)
		supportsResultSetHoldability(holdability) 
		supportsResultSetConcurrency(type, concurrency)

		deletesAreDetected(type)
		insertsAreDetected(type)
		updatesAreDetected(type)

		othersDeletesAreVisible(type)
		othersInsertsAreVisible(type)
		othersUpdatesAreVisible(type)

		ownDeletesAreVisible(type)
		ownInsertsAreVisible(type)
		ownUpdatesAreVisible(type)

** supportsResultSetType

Derby will return
      supportsResultSetType(TYPE_FORWARD_ONLY) -> true
      supportsResultSetType(TYPE_SCROLL_INSENSITIVE) -> true
      supportsResultSetType( TYPE_SCROLL_SENSITIVE) -> false

This is the present behavior as well, so no change here.

** supportsResultSetHoldability

Derby will return
      supportsResultSetHoldability(HOLD_CURSORS_OVER_COMMIT) -> true
      supportsResultSetHoldability(CLOSE_CURSORS_AT_COMMIT) -> true

No change.

** supportsResultSetConcurrency

Derby will return
      supportsResultSetConcurrency(TYPE_FORWARD_ONLY, CONCUR_READ_ONLY) -> true
      supportsResultSetConcurrency(TYPE_FORWARD_ONLY, CONCUR_UPDATABLE) -> true
      supportsResultSetConcurrency(TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY) -> true
      supportsResultSetConcurrency(TYPE_SCROLL_INSENSITIVE, CONCUR_UPDATABLE) -> true [new]
      supportsResultSetConcurrency(TYPE_SCROLL_SENSITIVE, CONCUR_UPDATABLE) -> false
      supportsResultSetConcurrency(TYPE_SCROLL_SENSITIVE, CONCUR_READ_ONLY) -> false

** ownDeletesAreVisible
** ownInsertsAreVisible
** ownUpdatesAreVisible
** othersDeletesAreVisible
** othersInsertsAreVisible
** othersUpdatesAreVisible

First a note on the meaning nof "own" and "others" as used by JDBC:
"Others" here means changes made by "other transactions" or "other
objects" in the same transaction. "Own" means changes made on the
result set object.

Modifications made by others are not visible in an insensitive result
set. However, own changes may or may not be visible, cf. the metadata
methods provided. to determine this.

Given the planned implementation using the cached rows in the hash
table, we have a choice here: When rows are deleted, updated or
inserted, we can either let those changes be reflected in only the
underlying table *or* in both the underlying table and in the result
set.

We propose to let own deletes (deleteRow) and updates (updateRow)
modify the result set and thus be visible, as well as modyfying the
the underlying table. We will not attempt to determine if an updated
row no longer qualifies for the result set, it will remain in the
result set after an update.

We propose for Inserts (insertRow) *not* to be visible in the result
set, only in the underlying data. This allows us to avoid the question
of whether the row qualifies the original query or not. Also, it allows
us to keep the result set size unchanged, provided we also allow for
"holes" in the result set when rows are deleted.

This semantic will allow the user to see the changed rows (or the
deletions) when scrolling back and forth in the result set and this
seems preferable to not reflect any change.  Also, the user would be
able to continue using absolute() and relative() because the position
of each row in the result set does not change due to deletes/inserts.

There is an asymmetry here (inserts are not visible, whereas updates and
deletes are), but we deem this to be acceptable. 

The proposal to omit inserted rows is just pragmatic. A symmetric
solution would be to append own inserts at the end of the result set,
without qualifying them. Another symmetric solution would be to
re-qualify updated rows and expel them if they no longer qualify, as
well as appending only qualifying rows when inserted (via insertRow).

Derby will then return:

      ownDeletesAreVisible(TYPE_SCROLL_INSENSITIVE) -> true
      ownInsertsAreVisible(TYPE_SCROLL_INSENSITIVE) -> false¹
      ownUpdatesAreVisible(TYPE_SCROLL_INSENSITIVE) -> true

For changes made by others, the definition is clear and we have:

      othersDeletesAreVisible(TYPE_SCROLL_INSENSITIVE) -> false
      othersInsertsAreVisible(TYPE_SCROLL_INSENSITIVE) -> false¹
      othersUpdatesAreVisible(TYPE_SCROLL_INSENSITIVE) -> false

¹The insert presents an interesting case: If the result set is not
fully materialized when an insert occurs (i.e. has not yet been
accessed via the cursor), the insert will be reflected in the result
set iff the row would qualify and its location in the scan is after
the currently materialized rows. This applies for inserts by both
"others" and "own". 

Note: A similar behavior can be observed for forward-only result sets
   in Derby: one may or may not see a qualifying tuple inserted by
   "others" (and "own" when JIRA-100 is fixed) while moving through
   the result set, depending on its location. Does this effect really
   amount to "sensitive" in the sense of the standard?

   Currently, Derby returns true for all three
   others<X>AreVisible(TYPE_FORWARD_ONLY) methods. On a first pass
   through a result set, the same level of visibility will actually be
   the case for others<X>AreVisible(TYPE_SCROLL_INSENSITIVE), since
   the result set is materialized incrementally also for scrollable
   updatable insensitive, but by definition this should return false,
   not true... This is slightly disturbing.

Interestingly, we found after choosing on the above semantics, that
Oracle has chosen the same visibility for TYPE_SCROLL_INSENSITIVE,
cf. table 17-1 in "Oracle Database JDBC Developer's Guide and
Reference 10g Release 1 (10.1)", page 17-20.

** deletesAreDetected
** insertsAreDetected
** updatesAreDetected

Even when changes are visible, the implementation may or may not be
able to tell whether a row in the result set has been deleted
("holes"), updated, or inserted since the result set was created. We
propose for Derby to support the detection for the two operations that
we make visible:

      deletesAreDetected(TYPE_SCROLL_INSENSITIVE) -> true
      insertsAreDetected(TYPE_SCROLL_INSENSITIVE) -> false
      updatesAreDetected(TYPE_SCROLL_INSENSITIVE) -> true

Thus calls to rowUpdated and rowDeleted can be meaningfully used by
the application.

** refreshRow

We do not plan to implement ResultSet#refreshRow, since for
insensitive result sets it is defined by JDBC to be a no-op, anyway.
Note that Oracle does update the result set from the underlying table
(even if insensitive) when using refreshRow.

** Update conflicts

[This discussion presumes an implementation based on the result set
rows being *copied* to achieve insensitivity, as currently is done in
the Derby read-only scrollable insensitive case, which we propose to
build on]

The question is: Is it acceptable that an update or delete take place
if the underlying row has been changed or deleted after the copy was
taken (overwrite)?

If the answer is yes (solution 1), we don't care, then there is no
problem. [Oracle does this, section on "Update conflicts" for
updatable result sets, page 17-15, ibid]

If potential overwrite is a problem for the user, she must increase
the isolation level. [Oracle offers exclusive locking by specifying
"FOR UPDATE" in the query to guard against such overwrites.]

Currently in Derby the user can set isolation level for a SELECT
statement to one of {RR|RS|CS|UR}, which although non-standard, would
be useful to achieve better concurrency than just raising the
transaction's (global) isolation level. We could even extend this to
offer exclusive locking, perhaps, if someone needs Oracle-like
(exclusive) locking on the result set. However, we do not propose to
do this as part of this effort.

If the answer is no, however, we have at least two choices:

	   - (Solution 2) Set read locks or (better) update locks on
             all visited (materialized) rows for the lifetime of the
             result set, so we avoid rows changing "under our feet".

	   - (Solution 3) Optimistic concurrency: Keep no locks on the
	     rows once read and if an update or delete occurs, check
	     if the row has been modified. If it has, throw an
	     exception and roll back the transaction.

Solution 2 is the more conservative, but at the cost of reduced
concurrency. Also, in our opinion an insensitive result set looses a
bit its point if it does not allow changes to happen to the underlying
table.

If we choose solution 3, the user must be prepared to handle
exceptions and retry the transaction.

This situation is somewhat analogous to the case where, for sensitive
result sets, an update or delete operation is based on data in a
prefetch buffer. If the prefetched view was not up-to-date for a row
attempted changed, should the modification fail or succeed? 

What would be the prefered behavior for Derby? Solution one, two or
three? Again, the standard does not offer any help, we need to make an
informed decision here. 

For solution one, updateRow would just clobber any update performed by
others. If the row has been deleted by others in the meantime, should
the updateRow fail or possibly re-insert the row? We prefer to fail
the update operation if the row has been deleted, since silently
changing an update into an insert would seem to be problematic,
possibly raising exceptions normally associated with insert. 

For solution two, we pay with less concurrency, if few rows are
modified.

For solution three, we force the user to deal with retryable aborts,
which is undesirable.

Conclusion: We want to build solution one for now, preferring to leave
it to the user to increase isolation level if update conflicts is a
problem.


** Query limitations for updatable result sets

The set of queries that are updatable via the scrollable updatable
result set would be the same as for the forward-only updatable result
set.

* Some implementation ideas

The basic idea is to combine the current scrollable, read-only
implementation with the implementation of the forward-only updatable
result sets, which builds on the UPDATE...WHERE CURRENT OF
implementation. 

We plan to build on the present disk-backed cache which is currently
used to implement the scrollable, read-only result sets. 
The cache is filled up with copies of the rows read from the
underlying table incrementally, as the cursor moves through the result
set. 


By adding row location (RowLocation) information, we are able to
reposition back to an earlier visited row. This would reset the update
lock on the (new) current row, if not kept (solution 1 above). By
reusing the UPDATE..WHERE CURRENT OF implementation we can then do an
updateRow/deleteRow operation if requested.  We also need information
to mark a row as deleted and updated (to be able to implement holes
and delete detection), and possibly information to perform the
up-to-date check, if we choose to go for the optimistic concurrency
(solution 3 above).

When the result set is closed we need to release (or downgrade to read
locks depending on isolation level) all update locks, if we choose the
conservative locking approach (solution 2).  (Actually updated rows
have write locks which are held to the end of the transaction as
usual.)

* Appendix I: Related JIRA issues

517 - ResultSet - relative(int rows) behaves different in embedded and
      client/server mode when the positioning before the first row or
      after the last row.
      Fernanda already has a patch.

610 - After commit, holdable result set cursor used in positioned
      update can't access row. Awaiting answer from Lance on this, may
      not be a bug (Dag)

611 - Network client: updateRow() causes a commit when autoCommit =
      true 







-- 
Dag H. Wanvik
Sun Microsystems, Web Services, Database Technology Group
Haakon VII gt. 7b, N-7485 Trondheim, Norway
Tel: x43496/+47 73842196, Fax:  +47 73842101

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the original
message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

Mime
View raw message