Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 11568 invoked from network); 14 Sep 2010 13:04:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Sep 2010 13:04:25 -0000 Received: (qmail 24929 invoked by uid 500); 14 Sep 2010 13:04:25 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 24151 invoked by uid 500); 14 Sep 2010 13:04:22 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 23474 invoked by uid 99); 14 Sep 2010 13:04:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 13:04:21 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [87.248.110.138] (HELO n21.bullet.mail.ukl.yahoo.com) (87.248.110.138) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 14 Sep 2010 13:04:14 +0000 Received: from [217.146.182.180] by n21.bullet.mail.ukl.yahoo.com with NNFMP; 14 Sep 2010 13:03:51 -0000 Received: from [87.248.110.198] by t6.bullet.ukl.yahoo.com with NNFMP; 14 Sep 2010 13:03:51 -0000 Received: from [127.0.0.1] by omp238.mail.ukl.yahoo.com with NNFMP; 14 Sep 2010 13:03:51 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 583539.26124.bm@omp238.mail.ukl.yahoo.com Received: (qmail 99028 invoked by uid 60001); 14 Sep 2010 13:03:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1284469431; bh=1iTsMjC4ev7mYHXq83xwRgAjiB8sy5s4UmGg3AGq1pk=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=kcCxjoe+kYHolcbY6rqutdH15u+cqqx/NWw/cpK3SR7smKEgGb5ZGtRW5stRyfKjSLbL3yEeyZZo4FoQGo9JioPnIh9XvgGWKoF/FP07g6lC/fYu6NSYKfsmKuDHiDLgc6nxnLGF5fUvXonyYYOQNBtA0IVSBvsv8GlBot0M8jA= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=HUueXatqpN8YS8RxyFg/MT7LvHK7KBw9JyMBe/Aqi5Gk9B0fggmCfBUoxGq/KB14hJ7YR/W/cHv1MT0AH0GWnLeQUEaBYghuZWWtJMgirUfj/7Qq89GQbAvpR1vmqKAwgvjCagStlZJDhJbqwHnmC8obIrRNZOxAvh+0lkRlIe8=; Message-ID: <398970.98985.qm@web29115.mail.ird.yahoo.com> X-YMail-OSG: PsmogPgVM1nX3xndTSA.NBi_lgD8IowEeiYSzMChbNlyPtB Vx1xXDL9X0xI3y.E8wHChjv_NsWI99WEjXPkJa3jjJ_3qZbb1PhYen5W28Nm iY5bYqGOHd1XeVKcf2lv1qRp_nbEbv0jdr94l0oLXIN14rxJbYOHroMvcSUQ zoZWv_4742iJwF0t82u_rxJKKYUC2XUAG2XBCRO87b8h5Cy1uACBP6veKzrT t8NGl_kgbQN8- Received: from [109.246.226.163] by web29115.mail.ird.yahoo.com via HTTP; Tue, 14 Sep 2010 13:03:51 GMT X-Mailer: YahooMailRC/470 YahooMailWebService/0.8.105.279950 References: <866536.18935.qm@web29109.mail.ird.yahoo.com> <4C8E4435.6020005@sbcglobal.net> <222001.88188.qm@web29120.mail.ird.yahoo.com> Date: Tue, 14 Sep 2010 13:03:51 +0000 (GMT) From: Tiago Espinha Subject: Re: Database name length To: derby-dev@db.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hello everyone again,=0A=0AI wanted to bottom-line the situation of the nam= e length and define a course of =0Aaction. Unless someone objects to it, I = will go through with the plan.=0A=0A1) It is probably better to keep the is= sue of UTF-8 encoding and length of the =0ARDBNAM separate. Because of this= , I will go ahead and, after testing, commit my =0Achanges to put UTF-8 in = place.=0A=0AThis means there will be a variable length restriction dependin= g on the =0Acharacters used but I think this is OK, provided the documentat= ion is updated =0Aaccordingly.=0A=0A2) A new issue will be created to deal = with the length of the RDBNAM field. I'm =0Anot sure how the OpenGroup work= s so I was hoping someone with more experience =0Awould volunteer to attemp= t to get this lifted. Alternatively, we can put this as =0Aan extension to = the DRDA - I'll leave that discussion to this specific issue, so =0Athat it= doesn't put a deadlock on the UTF-8 support.=0A=0A3) The goal is obviously= to not introduce regressions and to make sure we can =0Astill access old d= atabases with Latin characters. I believe this will be ensured =0Aas curren= tly the support for these characters is broken using the client driver. =0A= Knut has done some experiments on DERBY-4799 and I've also ran some experim= ents =0Aof my own, only to find that, for example, I can't create a databas= e with more =0Athan three Latin characters (on 10.5.3.0). Because of this, = even if the limit =0Afor Latin characters will now become 127 characters, i= t will still be an =0Aimprovement over what we have right now which is brok= en.=0A=0AIn this process I will also fix the bug Knut discovered. There is = more =0Ainformation about this on the issue itself (DERBY-4799).=0A=0AI thi= nk I've covered the main points. If anyone has comments, suggestions or =0A= concerns please feel free to chip in.=0A=0AThanks,=0ATiago=0A=0A=0A----- Or= iginal Message ----=0AFrom: Knut Anders Hatlen =0AT= o: derby-dev@db.apache.org=0ASent: Tue, 14 September, 2010 9:46:00=0ASubjec= t: Re: Database name length=0A=0ATiago Espinha wr= ites:=0A=0A> I agree Kathey. The bottom line is that if we don't impose thi= s 63=0A> character limitation, then the limit will be variable. For instanc= e,=0A> if you use **just** special Latin characters (i.e. =E1=E9=E7=F3=ED),= the limit=0A> will be 127 which is essentially what happens right now albe= it in a=0A> much less elegant way. EBCDIC according to Knut's experiment is= able=0A> to encode these special characters but it does seem like it takes= =0A> more than one byte.=0A>=0A> I tried to create a database with 243 spec= ial Latin characters (255 -=0A> 12 for ;create=3Dtrue) on a 10.5.3.0 server= and it just threw a very=0A> nasty array bounds exception (check my other = e-mail on the list).=0A=0AIt turns out the current limit is not caused by E= BCDIC, but rather some=0Afaulty conversion to UTF-8 in the error handling, = with the same root=0Acause as DERBY-4799. When I apply the fix for DERBY-47= 99 and try to=0Acreate a database whose name consists of 129 special Latin = characters, I=0Anow see this error:=0A=0Aij> connect =0A'jdbc:derby://local= host/=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5= =E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8= =E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5= =E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8= =E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5= =E6=F8=E5=E6=F8=E5;create=3Dtrue';=0A=0AERROR XJ041: DERBY SQL error: SQLCO= DE: -1, SQLSTATE: XJ041, SQLERRMC: Failed to =0Acreate database =0A'=E6=F8= =E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6= =F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8= =E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6= =F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8= =E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6= =F8=E5',=0A see the next exception for details.::SQLSTATE: XBM0HDirectory = =0A/tmp/server/=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7= =E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8= =E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7= =E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8= =E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7= =E6=F8=E5=E6=F8=E5=E6=F8=E5=0A cannot be created.=0A=0ASo it seems it is ac= tually a filesystem limitation (I use ZFS, which Dag=0Ain an earlier postin= g said had a limit on 255 *bytes* - not chars - per=0Apath component) that = would also be seen with the embedded driver.=0A=0A> Knut and Dag also sugge= sted that we raise this limitation up to=0A> 0xFFFF (65535) characters as a= llowed by the two bytes with which we=0A> encode length. Would you agree wi= th this approach?=0A=0A(Nit: It needs to be 65535-4 to account for the two = length bytes and the=0Atwo codepoint bytes.)=0A=0A> Just to sum: even if we= don't raise the limitation, it doesn't seem=0A> like my changes will be br= eaking access to currently existing=0A> databases as there is indeed a limi= t currently. The only issue is=0A> that if we are using strictly Chinese ch= aracters, we will indeed be=0A> capped at 85 characters (85 * 3 bytes =3D 2= 55 bytes). Since we didn't=0A> allow Chinese characters on the client drive= r before this might not=0A> be bad from a regression perspective but for lo= ng paths, this might=0A> be an issue (as it is even with other characters).= =0A=0AI agree, your suggested changes will be a net improvement, and not ha= ve=0Aany known negative sides, so I'm +1 to the changes regardless of wheth= er=0Aor not we end up lifting the 255 bytes limit.=0A=0AWell, almost no neg= ative sides... We still have the case where we have a=0Apath with no compon= ent exceeding the 255 bytes filesystem limitation,=0Abut the complete datab= ase name does exceed 255 bytes when converted to=0AUTF-8. Take this example= that works today:=0A=0Aij> connect =0A'jdbc:derby://localhost/=E6=F8=E5=E6= =F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5= =E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6/=F8=E5= =E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8= =E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5= =E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8=E5=E7=E6=F8=E5=E6=F8=E5=E6=F8= =E5;create=3Dtrue';=0A=0A=0AHere, the database name portion (including crea= te=3Dtrue) will take 142=0Acharacters in EBCDIC, and the filesystem limit i= s not exceeded because=0Ait's a multi-component path. When encoded in UTF-8= , however, the=0Adatabase name takes 271 bytes and will fail if we have the= 255 bytes=0Alimit.=0A=0AIt's probably an edge case, but it would be good t= o have it resolved=0Abefore we cut the release, since it's technically a re= gression. But I'd=0Abe fine with handling this in a separate JIRA issue aft= er we've switched=0Ato the UTF-8 CCSID manager.=0A=0A-- =0AKnut Anders=0A= =0A=0A=0A