db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Tuncation of trailing blanks and lengthless streaming overloads
Date Thu, 22 Jun 2006 12:22:37 GMT
Hello,

I'm working on DERBY-1417; adding new lengthless overloads to the
streaming API.  So far, I have only been looking at implementing this in
the embedded driver.  Based on some comments in the code, I have a few
questions and observations regarding truncation of trailing blanks in
the various character data types.

Type            Trail. blank trunc.     Where
====================================================================
CHAR                allowed             SQLChar.normalize
VARCHAR             allowed             SQLVarchar.normalize
LONG VARCHAR       disallowed           SQLLongVarchar.normalize
CLOB                allowed             streaming or
                                         SQLVarchar.normalize, depending
                                         on source.

As can be seen, only data for CLOB is truncated for trailing blanks in
the streaming class. We must still read all the data, or so much as we
need to know the insertion will fail, but we don't have to store it all
in memory.

Truncation of trailing blanks is not allowed at all for LONG VARCHAR
(according to code comments and bug 5592 - haven't found the place this
is stated in the specs).

My question is, should we do the truncate check for CHAR and VARCHAR on
the streaming level as well?
If we don't add this feature, inserting a
10GB file into a VARCHAR field by mistake will cause 10GB to be loaded
into memory even though the max size allowed is ~32K, possibly causing
out-of-memory errors. The error could be generated at an earlier stage
(possibly after reading ~32K +1 bytes).

As far as I can tell, adding this feature is a matter of modifying the
'ReaderToUTF8' class and the
'EmbedPrearedStatement.setCharacterStreamInternal' method.
One could also optimize the reading of data into LONG VARCHAR, where one
would abort the reading as soon as you can instead of taking it all into
memory first. This would require some special case handling in the
mentioned locations.


Another matter is that streams will not be checked for exact length
match when using the lengthless overloads, as we don't have a specified
length to compare against.
I have a preliminary implementation for setAsciiStream,
setCharacterStream  and setClob (all without length specifications) in
EmbedPreparedStatement.
I will continue my work by adding methods throwing
not-implemented-exceptions and implement the methods where appropriate.


Thoughts and feedback appreciated :)



-- 
Kristian

Mime
View raw message