db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tiago R. Espinha (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4827) Modify the documentation for the 10.7 release regarding the UTF-8 CCSID manager
Date Fri, 01 Oct 2010 16:00:40 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916924#action_12916924

Tiago R. Espinha commented on DERBY-4827:

Hi Kim,

Apologies for not having provided more info to begin with.

This will probably have to be a continuous effort even after the 10.7 release as the references
to RDBNAM and other fields (pretty much any DRDA command will be affected by this)  aren't
always explicit and it might take a while to find them all. However, now that I think about
it, from a user's point of view it might indeed just be these three fields: database name,
username and password.

The URL you mentioned doesn't seem to have anything in need of change. We only need to change
references to EBCDIC (to date it was the only encoding available for the database name, username
and password - now we support UTF-8) and to the 255-byte length limitation which now doesn't
always translate to 255 characters.

Kathey found this reference that requires changing: http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html

Here, it reads:
"For both driver and DataSource access, the database name (including path), user, password
and other attribute values must consist of single-byte characters that can be converted to
EBCDIC. The total byte length of the database name plus attributes when converted to EBCDIC
must not exceed 255 bytes. You may be able to work around this restriction for long paths
or paths that include multibyte characters by setting the derby.system.home system property
when starting Network Server and accessing the database with a relative path that is shorter
and does not include multibyte characters."

This is wrong for the most part now. Those three attribute values can consist of any character
that can be converted to UTF-8 and while the 255-byte limit still exists, perhaps it would
be nice to mention that in UTF-8 this might not always translate to 255 characters (might
be shorter).

I'll try to find more references to EBCDIC in the documentation - anything mentioning EBCDIC
will probably require some slight changes. If I find anything, I'll post it here.


> Modify the documentation for the 10.7 release regarding the UTF-8 CCSID manager
> -------------------------------------------------------------------------------
>                 Key: DERBY-4827
>                 URL: https://issues.apache.org/jira/browse/DERBY-4827
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions:
>            Reporter: Tiago R. Espinha
> With the introduction of UTF-8 support in the client driver (DERBY-728), the documentation
regarding the length of the arguments (RDBNAM, USRID, etc) will become misleading.
> On the list, Kathey has identified [1] one of such spots. Before releasing, we should
try to find any other occurrences and fix them accordingly. Please note that the UTF-8 is
a variable length encoding and as such, since we are maintaining the 255-byte length cap,
the length in characters will now be variable.
> Regular ASCII characters still take 1 byte, Latin and other extended characters take
2 bytes, Chinese characters take 3 bytes and some special characters take 4 bytes. [2]
> [1] - http://old.nabble.com/Database-name-length-tt29691419.html
> [2] - http://www.utf8-chartable.de/

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message