db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Enforcing length restrictions for streams of unknown length
Date Wed, 02 Aug 2006 17:58:15 GMT
This may or may not work, not sure.  Here is stuff to be aware of.
If you do this approach, the store will go ahead and insert and log data 
into the database.  For it to work correctly you will
have to make sure that the resulting error from the limit at least
aborts the statement which is doing the insert/update.

My guess is that you are going to get some sort of STORE exception
with the limit exception wrapped below it.  I would not be surprised
if the current store exception is more severe than you want it to
be as the current code does not expect this error - you may have
to define a new less severe error in this case.  There may be more
than one exception path.  Make sure to test the case where
the inserted blob exceeds the page size and the case where the
inserted blob is less than the page size.  Off hand I think this
is a new path for store, I can't think of any case where we expect
to get an exception while reading a stream for insert/update.
With user defined types there use to be an exercised codepath if
the user tried to READ more data than existed in the database -
again this path is no longer exercised since those datatypes were
removed.

This means that after your change a lot more data on disk may be
allocated to the file and the log than before your change.  Some of
this space may never be reclaimed if there are no subsequent inserts
or no explicit compresses.  Probably the worst case would be a user
bug where they defined a default 2 gig blob column, and somehow
generated an infinite loop in the feeding stream - this would then
use 2+ gig of log file and grow the table to 2gig and then return
the error.

This is not a new problem, it is similar to how unique key violations
work today.  A row with many indexes will insert the row, and may
update many indexes before hitting the uniqueness problem.  At that
point all the work is aborted.

I do think that when there is a length we should do the checking up
front, rather than pay the abort and possible space reclamation
penalty.  So unfortunately that would mean multiple paths through
the datatype insert/update stream paths.

Kristian Waagan wrote:
> Hello,
> 
> My initial work on the new JDBC4 length less overloads are approaching 
> completion, but I still have one issue that must be solved.
> 
> Currently, streams with unknown length are materialized to determine the 
> length. This is the approach I have implemented in the client driver in 
> lack of a better solution at the moment. But, the approach is also used 
> in the embedded driver, and this is simply not good enough.
> 
> If I pass the stream down to the storage layer, bypassing the length 
> checks done by the data type classes (SQLBinary, SQLBlob etc), the 
> storage layer will insert all the data it can get. For instance, I can 
> insert 3KB into a 2KB Blob column.
> To solve this, I plan to wrap the user/application stream in a 
> limit-stream. This stream will cause an exception to be thrown if it has 
> read more data than can be allowed in the column it is being inserted into.
> 
> In addition to the maximum length issue, there is also that of 
> truncation of trailing blanks. I don't yet fully understand what I have 
> to change. Much of the functionality I need is already in place, but 
> some changes might be required. For instance, the column width and 
> whether truncation is allowed or not might need to be passed down to the 
> limit-mechanism.
> 
> Questions, suggestions or other feedback is appreciated!
> 
> 
> 
> Related issues are DERBY-1473 and DERBY-1417.
> I plan to finish this for 10.2.
> 
> 
> 
> Thanks,


Mime
View raw message