db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Espinha <tiago.de...@yahoo.co.uk>
Subject Re: Database name length
Date Mon, 13 Sep 2010 20:50:03 GMT
I agree Kathey. The bottom line is that if we don't impose this 63 character 
limitation, then the limit will be variable. For instance, if you use **just** 
special Latin characters (i.e. áéçóí), the limit will be 127 which is 
essentially what happens right now albeit in a much less elegant way. EBCDIC 
according to Knut's experiment is able to encode these special characters but it 
does seem like it takes more than one byte.

I tried to create a database with 243 special Latin characters (255 - 12 for 
;create=true) on a 10.5.3.0 server and it just threw a very nasty array bounds 
exception (check my other e-mail on the list).

Knut and Dag also suggested that we raise this limitation up to 0xFFFF (65535) 
characters as allowed by the two bytes with which we encode length. Would you 
agree with this approach?

Just to sum: even if we don't raise the limitation, it doesn't seem like my 
changes will be breaking access to currently existing databases as there is 
indeed a limit currently. The only issue is that if we are using strictly 
Chinese characters, we will indeed be capped at 85 characters (85 * 3 bytes = 
255 bytes). Since we didn't allow Chinese characters on the client driver before 
this might not be bad from a regression perspective but for long paths, this 
might be an issue (as it is even with other characters).

Tiago



________________________________
From: Kathey Marsden <kmarsdenderby@sbcglobal.net>
To: derby-dev@db.apache.org
Cc: Tiago Espinha <tiago.derby@yahoo.co.uk>
Sent: Mon, 13 September, 2010 16:33:09
Subject: Re: Database name length

On 9/12/2010 9:22 AM, Tiago Espinha wrote: 
Is this an okay behavior? Or would it be preferable to impose a more strict  
limit where we assume that all characters take 4 bytes (worst case scenario in  
UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4  
bytes)? This would mean more work for my implementation and possibly an  
exclusion from 10.7. On the other hand, if we have this variable-length limit  
depending on the type of characters used, we should probably have some sort of  
release note alerting people about this fact.  

Hi Tiago, 

I don't think we should introduce any new limiting factors on     embedded as it 
may break existing applications. I am curious as to     the existing limits you 
found with embedded on Windows.  Does that     include the path leading up the 
database name and the attributes or     just the final database name?

For network server we have this existing documentation which needs     
modification with the introduction of UNICODEMGR.

http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html which says:


For both driver and DataSource access, the database name (including     path), 
user, password and other attribute values must consist of     single-byte 
characters that can be converted to EBCDIC. The total     byte length of the 
database name plus attributes when converted to     EBCDIC must not exceed 255 
bytes. You may be able to work around     this restriction for long paths or 
paths that include multibyte     characters by setting the derby.system.home 
system property when starting Network Server and accessing the     database with 
a relative path that is shorter and does not include     multibyte characters. 


This should be modified to remove the single byte character     restriction and 
change EBCDIC to UTF-8.

Thanks

Kathey


      
Mime
View raw message