db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <knut.hat...@oracle.com>
Subject Re: Database name length
Date Mon, 13 Sep 2010 10:05:59 GMT
Tiago Espinha <tiago.derby@yahoo.co.uk> writes:

> Is this an okay behavior? Or would it be preferable to impose a more strict 
> limit where we assume that all characters take 4 bytes (worst case scenario in 
> UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4 
> bytes)? This would mean more work for my implementation and possibly an 
> exclusion from 10.7. On the other hand, if we have this variable-length limit 
> depending on the type of characters used, we should probably have some sort of 
> release note alerting people about this fact.

Hi Tiago,

Let me see if I've understood this problem. Please correct me if I've
got it wrong.

Currently, the network protocol supports database names that take up to
255 bytes when encoded in EBCDIC. We don't allow any characters not
supported by EBCDIC. Since EBCDIC supports mostly the same set of
characters as ISO-8859-1, it means that we allow database names up to
255 characters from the Unicode range 0x00-0xff.

With the change to UTF-8, we get the following situation:

1) Database names which only contain US-ASCII characters (Unicode range
0x00-0x7f) still have a maximum length of 255 characters.

2) Database names which only contain ISO-8859-1 characters, some of
which not in the US-ASCII range, get a maximum length lower than 255
(exact limit depends on the number of non-ASCII characters), because
UTF-8 encodes the non-ASCII characters in two bytes.

3) Database names which contain characters outside of the ISO-8859-1
range will be supported, but with a lower maximum length than 255
characters (exact limit depends on the characters used).

(1) is not a change from previous versions, so that should be
fine. Since we didn't allow any characters outside of ISO-8859-1 before,
the change in (3) is an improvement, so I think it's fine too.

The problematic issue is (2), since existing applications that rely on
the ability to create long database names using characters from the
entire ISO-8859-1 range, may now be unable to connect to the database
using the client driver. This will be a functional regression, so we
will need a release note that explains how to work around this issue.

Does the above description sound about right?

The pragmatic approach would be to increase the maximum length. I see
that the writeScalarString() method that we use to write the RDBNAM
token, uses two bytes for the length:

        // now write the length.  We have the string byte length plus
        // 4 bytes, 2 for length and 2 for codepoint.
        int totalLength = stringByteLength + 4;
        bytes_[lengthOffset] = (byte) ((totalLength >>> 8) & 0xff);
        bytes_[lengthOffset + 1] = (byte) ((totalLength) & 0xff);

So it seems to me we have enough length bits to allow database names up
to 2^16-4 == 65532 characters. I cannot think of any problems that such
a change would cause. And I believe it would have a much smaller risk of
affecting existing applications than the suggestion to limit all
database names to 63 characters.

As to the possibility for a discrepancy between the maximum length in
client mode and embedded mode, I think we already have such a
discrepancy. The file system limit that prevents use of more than 255
characters in a database name in embedded mode, applies to each
component of the path name. The total length of the path in the URL may
exceed 255 characters if none of the directory names in the path exceed
255 characters.

The 255 characters limit in the network client, on the other hand,
applies to the entire path in the URL, not to each component of the
path. Also, the network client will take any connection attributes (like
create=true) as part of the database name, whereas the embedded driver
will not. Increasing the maximum length accepted by the network client
should make it less likely that someone gets bitten by this difference
between the drivers.

-- 
Knut Anders

Mime
View raw message