db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kathey Marsden <kmarsdende...@sbcglobal.net>
Subject Client decoding of server Fdoca data, which direction do we want to go?
Date Fri, 10 Feb 2006 00:10:49 GMT
For DERBY-900, Remove use of String(byte[]) and String(byte[], int, int)
constructors in network client leading to non-portable behaviour

I am looking at  this method in org.apache.derby.client.am.Sqlca to
create the string with the proper encoding.

private String bytes2String(byte[] bytes, int offset, int length)
            throws java.io.UnsupportedEncodingException {
        return new String(bytes, offset, length);
    }

In this case the Sqlca has read a ccsid that it stores and that needs to
get translated into a java encoding in order to create the String
properly.  Client is not currently equipped to make a translation, but
fact, that translation is always going to turn out to be "UTF-8" because
the server always sends Fdoca data in UTF-8 encoding.  I could easily
fix the bug by just hard coding "UTF-8"  in there,  but I  think that,
as Dan pointed out, the client is being a bit deceptive about what it
knows by passing the encoding and ccsid around the way it does and of
course always in the end coming up with the "UTF-8" answer (or in this
case coming up with no answer at all and having a bug).

The big question I guess in deciding how to fix this bug is:  What
direction do we want to go with client decoding the Fdoca data?  

1)We can have a  client that fesses up that it knows the answer. In that
case I'd say we add static variables to Configuration.java for the
server encoding, reference it in this case and file a Jira to cleanup a
lot of uneeded, uncovered, and potentially buggy  code in client.

2) We have a complete DRDA AR that knows how to do all the proper
translations, which means we bring the CharacterEncodings class in from
Network Server to fix this bug and start adding code into client to do
all the translations properly.

After looking at this for a while, I  think I would vote for 1, even
though I fixed DERBY-877 going in the other  direction.  I think having
a lot of code that can never be covered is not good.  Derby Client is
for Derby and should be optimized for that and can be made smaller
cleaner and less deceptive even if it supports a smaller subset of DRDA.


Here are some code examples to illustrate the current state:

Network Server always sends Fdoca data UTF-8, CCSID 1208, Windows or
mainframe, rain or shine.
>From NetworkServerControlImpl
    protected final static int CCSIDSBC = 1208; //use UTF8
    protected final static int CCSIDMBC = 1208; //use UTF8
    protected final static String DEFAULT_ENCODING = "UTF8"; // use UTF8
for writing


Client is  conflicted about whether it knows this or not.
In Typedef.java updateColumn()  it updates the columns with the ccsid
sent from the server. It goes through a fair number of different code
branches until it finally reveals that it knows the answer. *UTF-8* e.g.

String getCcsidMbcEncoding() throws DisconnectException {
        if (ccsidMbcEncoding_ == null) {
            ccsidMbcEncoding_ = UTF8ENCODING;
        }
        return ccsidMbcEncoding_;
    }


In other places it makes optimizations based CCSID's we never send which
seem sort of wrong anway e.g in Cursor.java

  if(ccsid_[column-1] == 1200)
          return getStringWithoutConvert (columnDataPosition_[column-1]
+ 2, columnDataComputedLength_[column-1] - 2, fdocaLength_[column-1]);


Still other places, like in DERBY-900 it converts according to the jvm
encoding which was causing failures on some platforms.

Note: I think Network Server does need to determine the client encoding
for  Fdoca  data  it reads.  This is because it serves many clients
including ODBC and needs to be a bit more flexible in this regard.




Mime
View raw message