db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <knut.hat...@oracle.com>
Subject Re: Database name length
Date Tue, 14 Sep 2010 08:46:00 GMT
Tiago Espinha <tiago.derby@yahoo.co.uk> writes:

> I agree Kathey. The bottom line is that if we don't impose this 63
> character limitation, then the limit will be variable. For instance,
> if you use **just** special Latin characters (i.e. áéçóí), the limit
> will be 127 which is essentially what happens right now albeit in a
> much less elegant way. EBCDIC according to Knut's experiment is able
> to encode these special characters but it does seem like it takes
> more than one byte.
> I tried to create a database with 243 special Latin characters (255 -
> 12 for ;create=true) on a server and it just threw a very
> nasty array bounds exception (check my other e-mail on the list).

It turns out the current limit is not caused by EBCDIC, but rather some
faulty conversion to UTF-8 in the error handling, with the same root
cause as DERBY-4799. When I apply the fix for DERBY-4799 and try to
create a database whose name consists of 129 special Latin characters, I
now see this error:

ij> connect 'jdbc:derby://localhost/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå;create=true';
ERROR XJ041: DERBY SQL error: SQLCODE: -1, SQLSTATE: XJ041, SQLERRMC: Failed to create database
see the next exception for details.::SQLSTATE: XBM0HDirectory /tmp/server/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå
cannot be created.

So it seems it is actually a filesystem limitation (I use ZFS, which Dag
in an earlier posting said had a limit on 255 *bytes* - not chars - per
path component) that would also be seen with the embedded driver.

> Knut and Dag also suggested that we raise this limitation up to
> 0xFFFF (65535) characters as allowed by the two bytes with which we
> encode length. Would you agree with this approach?

(Nit: It needs to be 65535-4 to account for the two length bytes and the
two codepoint bytes.)

> Just to sum: even if we don't raise the limitation, it doesn't seem
> like my changes will be breaking access to currently existing
> databases as there is indeed a limit currently. The only issue is
> that if we are using strictly Chinese characters, we will indeed be
> capped at 85 characters (85 * 3 bytes = 255 bytes). Since we didn't
> allow Chinese characters on the client driver before this might not
> be bad from a regression perspective but for long paths, this might
> be an issue (as it is even with other characters).

I agree, your suggested changes will be a net improvement, and not have
any known negative sides, so I'm +1 to the changes regardless of whether
or not we end up lifting the 255 bytes limit.

Well, almost no negative sides... We still have the case where we have a
path with no component exceeding the 255 bytes filesystem limitation,
but the complete database name does exceed 255 bytes when converted to
UTF-8. Take this example that works today:

ij> connect 'jdbc:derby://localhost/æøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæ/øåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøåçæøåæøåæøå;create=true';

Here, the database name portion (including create=true) will take 142
characters in EBCDIC, and the filesystem limit is not exceeded because
it's a multi-component path. When encoded in UTF-8, however, the
database name takes 271 bytes and will fail if we have the 255 bytes

It's probably an edge case, but it would be good to have it resolved
before we cut the release, since it's technically a regression. But I'd
be fine with handling this in a separate JIRA issue after we've switched
to the UTF-8 CCSID manager.

Knut Anders

View raw message