db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Espinha <tiago.de...@yahoo.co.uk>
Subject Database name length
Date Sun, 12 Sep 2010 16:22:42 GMT
Hello all,

I've been working out some boundary testing for my implementation of DERBY-728 
and there's something I've found out that I'd like to discuss here in the list.

Right now in embedded mode we have support for all kinds of characters. In this 
mode, the database name length limit is 255 under Windows - as this is an OS 
limitation. I'm not sure about the behavior on other OSes but what I've come to 
notice is that this limit is applied on a character level. I'm not sure if Derby 
even applies a limit at all in embedded mode since we're capped at 255 by 
Windows.

This means that in embedded mode, I can have a database name composed of 255 
characters like this: 'ç'. Still, the 'ç' character takes up 2 bytes in UTF-8 
and when we move to a client/server mode, the 255 length limit is applied to 
bytes and not characters (as specified by the DRDA specs and the ACR 7007).

In practice, we will now have a discrepancy in name length limits. Until now, we 
had a 255 character limit in both functioning modes. In embedded mode we only 
care about characters and in client/server, since everything was ASCII (or 
rather, EBCDIC), 1 character equalled 1 byte which meant that the limit was the 
same for both cases.

However, with this new CCSID manager which allows for UTF-8 characters in the 
client/server mode, things will change slightly. The 255 byte limit still 
applies as this is defined by the DRDA protocol, but characters may now take 
more than 1 byte. I said "may" because it really is "may" - using UTF-8, the 
length in bytes of each character is variable. The normal ASCII characters still 
just take 1 byte to encode, special Latin characters take 2 bytes, Chinese 
characters take 3 bytes and a whole other range of random characters take 4 
bytes.

What this all means is that there is no limit in characters that we can 
"advertise" as a cap for the dbname. Until now we could say that Derby imposes a 
255 character limit on database names under client/server, but from now on the 
limit in characters will vary. If we use ONLY characters like these 'áèç', then 
the limit will actually be 127 (2 * 127 = 254 bytes, and we can't take another 2 
byte'd character as we'd overrun the limit). But we can also use for example 249 
ASCII characters and 2 Chinese characters, which is in fact a total of 251 
characters (but 255 bytes, thus reaching the limit).

Is this an okay behavior? Or would it be preferable to impose a more strict 
limit where we assume that all characters take 4 bytes (worst case scenario in 
UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4 
bytes)? This would mean more work for my implementation and possibly an 
exclusion from 10.7. On the other hand, if we have this variable-length limit 
depending on the type of characters used, we should probably have some sort of 
release note alerting people about this fact.

Just wanted to get some thoughts and opinions on this...

Thanks,
Tiago


      


Mime
View raw message