db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Espinha <tiago.de...@yahoo.co.uk>
Subject Re: Database name length
Date Mon, 13 Sep 2010 10:49:09 GMT
Thank you Knut, for your reply.

Your point #1 is correct. As for point #2 and #3 just a small correction. It is 
all characters falling outside the *US-ASCII* encoding that will get a length 
lower than 255 characters as anything other than ASCII requires more than just 1 
byte to encode. I'm fairly sure that at this point we do not support ISO-8859-1 
through the client driver as these characters (the extended ones like áéó etc) 
fall outside US-ASCII. So hopefully this won't break anything as we didn't 
support these characters previously.

As for your suggestion of increasing the length of the field, I'm not sure 
that's an option. This length limitation is imposed by the DRDA specification 
and the ACR unfortunately didn't change this. On the ACR it reads "As of DDM 
Level 7, the RDBNAM can accommodate an RDB name of up to 255 bytes in length, 
and its format will vary depending on the length of the RDB name". So 
essentially, we could easily support a much larger RDB name on Derby but the 
specification forbids it.

You're right about the current discrepancy in lengths... so this means it should 
be fairly ok to have it on a different level. I think this is definitely 
something that should be properly documented though as it will be an odd 
behavior from an end-user's point of view, who might be oblivious to the byte 
length limitation and character encoding.


----- Original Message ----
From: Knut Anders Hatlen <knut.hatlen@oracle.com>
To: derby-dev@db.apache.org
Sent: Mon, 13 September, 2010 11:05:59
Subject: Re: Database name length

Tiago Espinha <tiago.derby@yahoo.co.uk> writes:

> Is this an okay behavior? Or would it be preferable to impose a more strict 
> limit where we assume that all characters take 4 bytes (worst case scenario in 

> UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4 
> bytes)? This would mean more work for my implementation and possibly an 
> exclusion from 10.7. On the other hand, if we have this variable-length limit 
> depending on the type of characters used, we should probably have some sort of 

> release note alerting people about this fact.

Hi Tiago,

Let me see if I've understood this problem. Please correct me if I've
got it wrong.

Currently, the network protocol supports database names that take up to
255 bytes when encoded in EBCDIC. We don't allow any characters not
supported by EBCDIC. Since EBCDIC supports mostly the same set of
characters as ISO-8859-1, it means that we allow database names up to
255 characters from the Unicode range 0x00-0xff.

With the change to UTF-8, we get the following situation:

1) Database names which only contain US-ASCII characters (Unicode range
0x00-0x7f) still have a maximum length of 255 characters.

2) Database names which only contain ISO-8859-1 characters, some of
which not in the US-ASCII range, get a maximum length lower than 255
(exact limit depends on the number of non-ASCII characters), because
UTF-8 encodes the non-ASCII characters in two bytes.

3) Database names which contain characters outside of the ISO-8859-1
range will be supported, but with a lower maximum length than 255
characters (exact limit depends on the characters used).

(1) is not a change from previous versions, so that should be
fine. Since we didn't allow any characters outside of ISO-8859-1 before,
the change in (3) is an improvement, so I think it's fine too.

The problematic issue is (2), since existing applications that rely on
the ability to create long database names using characters from the
entire ISO-8859-1 range, may now be unable to connect to the database
using the client driver. This will be a functional regression, so we
will need a release note that explains how to work around this issue.

Does the above description sound about right?

The pragmatic approach would be to increase the maximum length. I see
that the writeScalarString() method that we use to write the RDBNAM
token, uses two bytes for the length:

        // now write the length.  We have the string byte length plus
        // 4 bytes, 2 for length and 2 for codepoint.
        int totalLength = stringByteLength + 4;
        bytes_[lengthOffset] = (byte) ((totalLength >>> 8) & 0xff);
        bytes_[lengthOffset + 1] = (byte) ((totalLength) & 0xff);

So it seems to me we have enough length bits to allow database names up
to 2^16-4 == 65532 characters. I cannot think of any problems that such
a change would cause. And I believe it would have a much smaller risk of
affecting existing applications than the suggestion to limit all
database names to 63 characters.

As to the possibility for a discrepancy between the maximum length in
client mode and embedded mode, I think we already have such a
discrepancy. The file system limit that prevents use of more than 255
characters in a database name in embedded mode, applies to each
component of the path name. The total length of the path in the URL may
exceed 255 characters if none of the directory names in the path exceed
255 characters.

The 255 characters limit in the network client, on the other hand,
applies to the entire path in the URL, not to each component of the
path. Also, the network client will take any connection attributes (like
create=true) as part of the database name, whereas the embedded driver
will not. Increasing the maximum length accepted by the network client
should make it less likely that someone gets bitten by this difference
between the drivers.

Knut Anders


View raw message