db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Espinha <tiago.de...@yahoo.co.uk>
Subject Re: Database name length
Date Tue, 14 Sep 2010 13:03:51 GMT
Hello everyone again,

I wanted to bottom-line the situation of the name length and define a course of 
action. Unless someone objects to it, I will go through with the plan.

1) It is probably better to keep the issue of UTF-8 encoding and length of the 
RDBNAM separate. Because of this, I will go ahead and, after testing, commit my 
changes to put UTF-8 in place.

This means there will be a variable length restriction depending on the 
characters used but I think this is OK, provided the documentation is updated 
accordingly.

2) A new issue will be created to deal with the length of the RDBNAM field. I'm 
not sure how the OpenGroup works so I was hoping someone with more experience 
would volunteer to attempt to get this lifted. Alternatively, we can put this as 
an extension to the DRDA - I'll leave that discussion to this specific issue, so 
that it doesn't put a deadlock on the UTF-8 support.

3) The goal is obviously to not introduce regressions and to make sure we can 
still access old databases with Latin characters. I believe this will be ensured 
as currently the support for these characters is broken using the client driver. 
Knut has done some experiments on DERBY-4799 and I've also ran some experiments 
of my own, only to find that, for example, I can't create a database with more 
than three Latin characters (on 10.5.3.0). Because of this, even if the limit 
for Latin characters will now become 127 characters, it will still be an 
improvement over what we have right now which is broken.

In this process I will also fix the bug Knut discovered. There is more 
information about this on the issue itself (DERBY-4799).

I think I've covered the main points. If anyone has comments, suggestions or 
concerns please feel free to chip in.

Thanks,
Tiago


----- Original Message ----
From: Knut Anders Hatlen <knut.hatlen@oracle.com>
To: derby-dev@db.apache.org
Sent: Tue, 14 September, 2010 9:46:00
Subject: Re: Database name length

Tiago Espinha <tiago.derby@yahoo.co.uk> writes:

> I agree Kathey. The bottom line is that if we don't impose this 63
> character limitation, then the limit will be variable. For instance,
> if you use **just** special Latin characters (i.e. áéçóí), the limit
> will be 127 which is essentially what happens right now albeit in a
> much less elegant way. EBCDIC according to Knut's experiment is able
> to encode these special characters but it does seem like it takes
> more than one byte.
>
> I tried to create a database with 243 special Latin characters (255 -
> 12 for ;create=true) on a 10.5.3.0 server and it just threw a very
> nasty array bounds exception (check my other e-mail on the list).

It turns out the current limit is not caused by EBCDIC, but rather some
faulty conversion to UTF-8 in the error handling, with the same root
cause as DERBY-4799. When I apply the fix for DERBY-4799 and try to
create a database whose name consists of 129 special Latin characters, I
now see this error:

ij> connect 
'jdbc:derby://localhost/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå;create=true';

ERROR XJ041: DERBY SQL error: SQLCODE: -1, SQLSTATE: XJ041, SQLERRMC: Failed to 
create database 
'æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå',
 see the next exception for details.::SQLSTATE: XBM0HDirectory 
/tmp/server/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå
 cannot be created.

So it seems it is actually a filesystem limitation (I use ZFS, which Dag
in an earlier posting said had a limit on 255 *bytes* - not chars - per
path component) that would also be seen with the embedded driver.

> Knut and Dag also suggested that we raise this limitation up to
> 0xFFFF (65535) characters as allowed by the two bytes with which we
> encode length. Would you agree with this approach?

(Nit: It needs to be 65535-4 to account for the two length bytes and the
two codepoint bytes.)

> Just to sum: even if we don't raise the limitation, it doesn't seem
> like my changes will be breaking access to currently existing
> databases as there is indeed a limit currently. The only issue is
> that if we are using strictly Chinese characters, we will indeed be
> capped at 85 characters (85 * 3 bytes = 255 bytes). Since we didn't
> allow Chinese characters on the client driver before this might not
> be bad from a regression perspective but for long paths, this might
> be an issue (as it is even with other characters).

I agree, your suggested changes will be a net improvement, and not have
any known negative sides, so I'm +1 to the changes regardless of whether
or not we end up lifting the 255 bytes limit.

Well, almost no negative sides... We still have the case where we have a
path with no component exceeding the 255 bytes filesystem limitation,
but the complete database name does exceed 255 bytes when converted to
UTF-8. Take this example that works today:

ij> connect 
'jdbc:derby://localhost/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæ/øåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå;create=true';


Here, the database name portion (including create=true) will take 142
characters in EBCDIC, and the filesystem limit is not exceeded because
it's a multi-component path. When encoded in UTF-8, however, the
database name takes 271 bytes and will fail if we have the 255 bytes
limit.

It's probably an edge case, but it would be good to have it resolved
before we cut the release, since it's technically a regression. But I'd
be fine with handling this in a separate JIRA issue after we've switched
to the UTF-8 CCSID manager.

-- 
Knut Anders



      


Mime
View raw message