db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-2618) EmbedClob.setAsciiStream does not handle non-ascii characters correctly
Date Mon, 14 May 2007 10:24:16 GMT

     [ https://issues.apache.org/jira/browse/DERBY-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kristian Waagan updated DERBY-2618:
-----------------------------------

    Attachment: Derby2618BugsInClobAsciiStream.java

Seems there are more than one kind of bug in this area.
>From the users perspective, you can write any int to the clob ascii stream, since it is
a OutputStream and has the method write(int).

I see errors happening in two places; some when calling write, some when calling read when
reading the value back.
Would be nice if anyone could try the repro and confirm my findings, or let me know if I have
been using the JDBC API incorrectly. Note that the repro must be run with Java SE 6 because
it uses Connection.createClob().


I have not looked at this in detail, but I think the stream used to write to the Clob must
do some filtering on the value. 
This is typically ANDing with 0xff and/or replacing the incoming int with the Unicode marker
for unknown character (\uFFFD).

Then the question is, which values are considered "non-ASCII"?
According to JDBC, an ASCII value is between 0 and 255, inclusive. Represented as byte, you
will get negative values for a part of this range. I assume ISO-8859-1 is the encoding standard
to be used, and further that these values will be mapped directly into Unicode.

Say the Tamil letter with Unicode value '\u0B88' is written to the stream returned by Clob.setAsciiStream(1)
with OutputStream.write(int). Should we do "if i > 255 write '\uFFFD'", or should we ignore
the higher bits and say this is value 136 (this is mentioned in the comment for ClobAsciiStream),
which happens to be an unused code in ISO-8859-1?

When/if the unknown character code is stored internally (\uFFFD), it must be converted to
'?' if it is read back using getAsciiStream (returns an InputStream).

The default behavior for OutputStream.write(int), is to cast the int to char and then call
the abstract method write(char[],int,int).

No matter what the answers to the questions above are, Derby should not fail with a UTFDataFormatException
when reading data you have already been allowed to insert.

I'm on thin ice for how to correctly handle these issues, and I'm sure there are more, so
please correct me and add additional information.

> EmbedClob.setAsciiStream does not handle non-ascii characters correctly
> -----------------------------------------------------------------------
>
>                 Key: DERBY-2618
>                 URL: https://issues.apache.org/jira/browse/DERBY-2618
>             Project: Derby
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 10.3.0.0
>            Reporter: Kristian Waagan
>         Assigned To: Kristian Waagan
>         Attachments: Derby2618BugsInClobAsciiStream.java
>
>
> If non-ascii characters are written to the Writer returned by EmbedClob.setAsciiStream,
Derby fails with a 'java.io.UTFDataFormatException' when the CLOB value is read back.
> I'm filing this bug with 'Major' priority, as the bug does not manifest itself when entering
data, just when you try to get it back. Except from filtering the data yourself before entering
it, I don't think there is any workaround.
> Sample stack trace from a modified test:
> 1) testClobAsciiWrite1ParamKRISTIWAA(org.apache.derbyTesting.functionTests.tests.jdbcapi.LobStreamsTest)java.sql.SQLException:
Unable to set stream: 'java.io.UTFDataFormatException'.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:88)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:94)
>         at org.apache.derby.impl.jdbc.Util.setStreamFailure(Util.java:246)
>         at org.apache.derby.impl.jdbc.EmbedClob.length(EmbedClob.java:190)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.setClob(EmbedPreparedStatement.java:1441)
>         at org.apache.derbyTesting.functionTests.tests.jdbcapi.LobStreamsTest.testClobAsciiWrite1ParamKRISTIWAA(LobStreamsTest.java:255)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:88)
> Caused by: java.sql.SQLException: Unable to set stream: 'java.io.UTFDataFormatException'.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:135)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
>         ... 22 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message